2015-11-05 35 views
4

我在使用python中的pandas更改現有DataFrame中的標題行時遇到問題。在導入熊貓和csv文件後,我將標題行設置爲None,以便在移調後能夠刪除重複的日期。但是,這留下了我不想要的行標題(實際上是一個索引列)。如何更改Python數據框中的標題行

df = pd.read_csv(spreadfile, header=None) 

df2 = df.T.drop_duplicates([0], take_last=True) 
del df2[1] 

indcol = df2.ix[:,0] 
df3 = df2.reindex(indcol) 

但是,上述不具代表性的代碼卻失敗了兩項。索引列現在是必需的,但所有條目現在都是NaN。我對python的理解還不足以認識python在做什麼。下面所需的輸出是我所需要的,任何幫助將不勝感激!前

DF2重新索引:

重新索引之後
 0    2    3    4    5 
0  NaN XS0089553282 XS0089773484 XS0092157600 XS0092541969 
1 01-May-14   131.7   165.1   151.8   88.9 
3 02-May-14   131   164.9   151.7   88.5 
5 05-May-14   131.1   165   151.8   88.6 
7 06-May-14   129.9   163.4   151.2   87.1 

DF2:

   0 2 3 4 5 
0         
NaN  NaN NaN NaN NaN NaN 
01-May-14 NaN NaN NaN NaN NaN 
02-May-14 NaN NaN NaN NaN NaN 
05-May-14 NaN NaN NaN NaN NaN 
06-May-14 NaN NaN NaN NaN NaN 

DF2期望:直接

 XS0089553282 XS0089773484 XS0092157600 XS0092541969 
01-May-14   131.7   165.1   151.8   88.9 
02-May-14   131   164.9   151.7   88.5 
05-May-14   131.1   165   151.8   88.6 
06-May-14   129.9   163.4   151.2   87.1 

回答

2

ASIGN:

indcol = df2.ix[:,0] 
df2.columns = indcol 

reindex問題是,它會根據您的DF的現有索引和列值,以便您順利通過新的列值,爲什麼你得到所有NaN小號

更簡單的方法不存在,因此你在做什麼努力做到:

In [147]: 
# take the cols and index values of interest 
cols = df.loc[0, '2':] 
idx = df['0'].iloc[1:] 
print(cols) 
print(idx) 

2 XS0089553282 
3 XS0089773484 
4 XS0092157600 
5 XS0092541969 
Name: 0, dtype: object 

1 01-May-14 
3 02-May-14 
5 05-May-14 
7 06-May-14 
Name: 0, dtype: object 

In [157]: 
# drop the first row and the first column 
df2 = df.drop('0', axis=1).drop(0) 
# overwrite the index values 
df2.index = idx.values 
df2 

Out[157]: 
       2  3  4  5 
01-May-14 131.7 165.1 151.8 88.9 
02-May-14 131 164.9 151.7 88.5 
05-May-14 131.1 165 151.8 88.6 
06-May-14 129.9 163.4 151.2 87.1 

In [158]: 
# now overwrite the column values  
df2.columns = cols.values 
df2 

Out[158]: 
      XS0089553282 XS0089773484 XS0092157600 XS0092541969 
01-May-14  131.7  165.1  151.8   88.9 
02-May-14   131  164.9  151.7   88.5 
05-May-14  131.1   165  151.8   88.6 
06-May-14  129.9  163.4  151.2   87.1 
0
In [310]: 
cols = df.iloc[0 , 1:] 
cols 
Out[310]: 
1 XS0089553282 
2 XS0089773484 
3 XS0092157600 
4 XS0092541969 
Name: 0, dtype: object 

In [311]: 
df.drop(0 , inplace=True) 
df 
Out[311]: 
      0 1  2   3 4 
1 01-May-14 131.7 165.1 151.8 88.9 
2 02-May-14 131  164.9 151.7 88.5 
3 05-May-14 131.1 165  151.8 88.6 
4 06-May-14 129.9 163.4 151.2 87.1 

In [312]: 
df.set_index(0 , inplace=True) 
df 

Out[312]: 
    0   1 2   3 4  
01-May-14 131.7 165.1 151.8 88.9 
02-May-14 131  164.9 151.7 88.5 
05-May-14 131.1 165  151.8 88.6 
06-May-14 129.9 163.4 151.2 87.1 

In [315]: 

df 
df.columns = cols 
df 
Out[315]: 
      XS0089553282 XS0089773484 XS0092157600 XS0092541969     
01-May-14 131.7     165.1 151.8   88.9 
02-May-14 131     164.9 151.7   88.5 
05-May-14 131.1     165 151.8   88.6 
06-May-14 129.9     163.4 151.2   87.1 
+0

就地=真對我產生一個錯誤: 類型錯誤:降()得到了一個意想不到的關鍵字參數「就地」 – Oaka13

+0

這個錯誤表示有是名'inplace'沒有參數對於'drop'這個方法當然不是這種情況,我不確定對此,你確定你遵循了相同的步驟嗎? –

+0

我再次嘗試過同樣的事情,這很奇怪,因爲set_index有inplace參數。 df.drop的唯一參數是標籤,軸和級別。 – Oaka13

相關問題