2017-03-01 126 views
1

我試圖將一個長格式df與多個索引轉換爲一個寬格式的df。爲什麼df_in.pivot()失敗和/或爲什麼pd.pivot_table返回含有奇怪分層索引的結果,這樣我就無法訪問我想要投射的列?熊貓df.pivot()和pd.pivot_table()與多個索引

# input table 
df_in = pd.DataFrame({'idx1':range(2)*4, 'idx2':['a']*4+['b']*4, 'field': ['f1']*2+['f2']*2+['f1']*2+['f2']*2, 'value': np.array(range(2)*4)*2+1}) 
''' 
    field idx1 idx2 value 
0 f1  0 a  1 
1 f1  1 a  3 
2 f2  0 a  1 
3 f2  1 a  3 
4 f1  0 b  1 
5 f1  1 b  3 
6 f2  0 b  1 
7 f2  1 b  3 
''' 

# want something like this 
pd.DataFrame({'idx1':range(2)*2, 'idx2': ['a']*2+['b']*2, 'a':[1,3]*2, 'b':[1,3]*2}) 
''' 
    a b idx1 idx2 
0 1 1  0 a 
1 3 3  1 a 
2 1 1  0 b 
3 3 3  1 b 
''' 

#doesn't work => ValueError: all arrays must be same length 
df_in.pivot(index=['idx1','idx2'], columns =['field']) 

#doesn't work => weird hierarchical index 
pd.pivot_table(df_in, index=['idx1','idx2'], columns =['field']) 

''' 
      value 
field  f1 f2 
idx1 idx2   
0 a  1 1 
    b  1 1 
1 a  3 3 
    b  3 3 
''' 
# doesn't work => KeyError: 'f1' 
pd.pivot_table(df_in, index=['idx1','idx2'], columns =['field'])['f1'] 

# doesn't work => KeyError: 'f1' 
pd.pivot_table(df_in, index=['idx1','idx2'], columns =['field']).reset_index()['f1'] 

回答

1

爲了避免多級柱,明確作爲一個字符串,而不是一個列表就足夠了指定的值列:

df_in.pivot_table(values='value', index=['idx1', 'idx2'], columns='field').reset_index() 

#field idx1 idx2 f1 f2 
#0  0  a 1 1 
#1  0  b 1 1 
#2  1  a 3 3 
#3  1  b 3 3 

如果你有多層列,您可以使用元組來訪問它們,例如:

df_out = df_in.pivot_table(values=['value'], index=['idx1', 'idx2'], columns='field') 

給出了一個具有多級列的數據框,以訪問f1欄,你可以這樣做:

df_out[('value', 'f1')] 

給出:

#idx1 idx2 
#0  a  1 
#  b  1 
#1  a  3 
#  b  3 
#Name: (value, f1), dtype: int64