2016-09-15 100 views
4

我有這樣的一個數據框下面顯示:從小寫轉換整個數據框爲大寫與熊貓

# Create an example dataframe about a fictional army 
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'], 
      'company': ['1st', '1st', '2nd', '2nd'], 
      'deaths': ['kkk', 52, '25', 616], 
      'battles': [5, '42', 2, 2], 
      'size': ['l', 'll', 'l', 'm']} 
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size']) 

enter image description here

我的目標是每一個字符串變換數據框的裏面上這樣的情況下,它看起來像這樣:

enter image description here

注:所有數據類型和對象必須不要改變;輸出必須包含所有對象。我想避免一個接一個地轉換每一列...我想通常在整個數據幀中完成它。

到目前爲止我試過是這樣做,但沒有成功

df.str.upper() 
+0

'str'只適用於系列... – IanS

回答

12

astype()將每個系列投給dtype對象(串),然後調用str()方法對轉換後的系列得到字符串字面並在其上調用功能upper()。請注意,在此之後,所有列的dtype將更改爲object。

In [17]: df 
Out[17]: 
    regiment company deaths battles size 
0 Nighthawks  1st kkk  5 l 
1 Nighthawks  1st  52  42 ll 
2 Nighthawks  2nd  25  2 l 
3 Nighthawks  2nd 616  2 m 

In [18]: df.apply(lambda x: x.astype(str).str.upper()) 
Out[18]: 
    regiment company deaths battles size 
0 NIGHTHAWKS  1ST KKK  5 L 
1 NIGHTHAWKS  1ST  52  42 LL 
2 NIGHTHAWKS  2ND  25  2 L 
3 NIGHTHAWKS  2ND 616  2 M 

以後,您可以轉換的「戰鬥」一欄再次數值,使用to_numeric()

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper()) 

In [43]: df2['battles'] = pd.to_numeric(df2['battles']) 

In [44]: df2 
Out[44]: 
    regiment company deaths battles size 
0 NIGHTHAWKS  1ST KKK  5 L 
1 NIGHTHAWKS  1ST  52  42 LL 
2 NIGHTHAWKS  2ND  25  2 L 
3 NIGHTHAWKS  2ND 616  2 M 

In [45]: df2.dtypes 
Out[45]: 
regiment object 
company  object 
deaths  object 
battles  int64 
size  object 
dtype: object 
+0

發帖只是代碼是不是SO非常有用,你能解釋一下你的答案和提供的步驟,如果可能的故障 – EdChum

2

由於str僅適用於系列,可以將其應用到每一列單獨再串連:

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 
Out[6]: 
    regiment company deaths battles size 
0 NIGHTHAWKS  1ST KKK  5 L 
1 NIGHTHAWKS  1ST  52  42 LL 
2 NIGHTHAWKS  2ND  25  2 L 
3 NIGHTHAWKS  2ND 616  2 M 

編輯:performan ce比較

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper()) 
100 loops, best of 3: 3.32 ms per loop 

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 
100 loops, best of 3: 3.32 ms per loop 

兩個答案在小數據框上的表現相同。

In [15]: df = pd.concat(10000 * [df]) 

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 
10 loops, best of 3: 104 ms per loop 

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper()) 
10 loops, best of 3: 130 ms per loop 

在大型數據框中,我的答案稍微快一點。