從小寫轉換整個數據框爲大寫與熊貓

我有這樣的一個數據框下面顯示：從小寫轉換整個數據框爲大寫與熊貓

# Create an example dataframe about a fictional army 
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'], 
      'company': ['1st', '1st', '2nd', '2nd'], 
      'deaths': ['kkk', 52, '25', 616], 
      'battles': [5, '42', 2, 2], 
      'size': ['l', 'll', 'l', 'm']} 
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

我的目標是每一個字符串變換數據框的裏面上這樣的情況下，它看起來像這樣：

注：所有數據類型和對象必須不要改變;輸出必須包含所有對象。我想避免一個接一個地轉換每一列...我想通常在整個數據幀中完成它。

到目前爲止我試過是這樣做，但沒有成功

df.str.upper()

來源

2016-09-15 Federico Gentile

'str'只適用於系列... – IanS

astype()將每個系列投給dtype對象（串），然後調用str()方法對轉換後的系列得到字符串字面並在其上調用功能upper()。請注意，在此之後，所有列的dtype將更改爲object。

In [17]: df 
Out[17]: 
    regiment company deaths battles size 
0 Nighthawks  1st kkk  5 l 
1 Nighthawks  1st  52  42 ll 
2 Nighthawks  2nd  25  2 l 
3 Nighthawks  2nd 616  2 m 

In [18]: df.apply(lambda x: x.astype(str).str.upper()) 
Out[18]: 
    regiment company deaths battles size 
0 NIGHTHAWKS  1ST KKK  5 L 
1 NIGHTHAWKS  1ST  52  42 LL 
2 NIGHTHAWKS  2ND  25  2 L 
3 NIGHTHAWKS  2ND 616  2 M

以後，您可以轉換的「戰鬥」一欄再次數值，使用to_numeric()：

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper()) 

In [43]: df2['battles'] = pd.to_numeric(df2['battles']) 

In [44]: df2 
Out[44]: 
    regiment company deaths battles size 
0 NIGHTHAWKS  1ST KKK  5 L 
1 NIGHTHAWKS  1ST  52  42 LL 
2 NIGHTHAWKS  2ND  25  2 L 
3 NIGHTHAWKS  2ND 616  2 M 

In [45]: df2.dtypes 
Out[45]: 
regiment object 
company  object 
deaths  object 
battles  int64 
size  object 
dtype: object

來源

2016-09-15 13:19:39

發帖只是代碼是不是SO非常有用，你能解釋一下你的答案和提供的步驟，如果可能的故障 – EdChum

由於str僅適用於系列，可以將其應用到每一列單獨再串連：

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 
Out[6]: 
    regiment company deaths battles size 
0 NIGHTHAWKS  1ST KKK  5 L 
1 NIGHTHAWKS  1ST  52  42 LL 
2 NIGHTHAWKS  2ND  25  2 L 
3 NIGHTHAWKS  2ND 616  2 M

編輯：performan ce比較

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper()) 
100 loops, best of 3: 3.32 ms per loop 

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 
100 loops, best of 3: 3.32 ms per loop

兩個答案在小數據框上的表現相同。

In [15]: df = pd.concat(10000 * [df]) 

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 
10 loops, best of 3: 104 ms per loop 

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper()) 
10 loops, best of 3: 130 ms per loop

在大型數據框中，我的答案稍微快一點。

來源

2016-09-15 13:23:52 IanS

從小寫轉換整個數據框爲大寫與熊貓

回答

相關問題