2016-05-26 67 views
1

我有如下一個數據幀:我需要更改當前數據框的格式。我應該怎麼做?

In [14]: grouped_data 
Out[14]: 
    monthyear Facility  Date  Yield 
0 Dec 15  CCM1 2015-12-01 2550.000000 
1 Feb 16  CCM1 2016-02-01 4250.000000 
2 Jan 16  CCM2 2016-01-01 1540.000000 
3 Jan 16  CCM3 2016-01-01 6800.000000 
4 Nov 15  CCM1 2015-11-01 921.458157 
5 Nov 15  CCM2 2015-11-01 1750.310038 
6 Sep 15  CCM3 2015-09-01 5191.197065 

現在我需要的數據框,看起來像這樣:

monthyear CCM1  CCM2  CCM3   Date  
0 Dec 15 2550.000000 0   0  2015-12-01 
1 Feb 16 4250.000000 0   0  2016-02-01 
2 Jan 16  0  1540.000000 6800.000000 2016-01-01 
3 Nov 15 921.458157 1750.310038 0  2015-11-01 
4 Sep 15  0  5191.197065 0  2015-09-01 

我將如何做到這一點與大熊貓。請幫忙。提前致謝。

回答

1

使用pivot_table

print (df.pivot_table(index=['monthyear','Date'], 
         columns='Facility', 
         values='Yield', 
         fill_value=0)) 

Facility      CCM1   CCM2   CCM3 
monthyear Date            
Dec 15 2015-12-01 2550.000000  0.000000  0.000000 
Feb 16 2016-02-01 4250.000000  0.000000  0.000000 
Jan 16 2016-01-01  0.000000 1540.000000 6800.000000 
Nov 15 2015-11-01 921.458157 1750.310038  0.000000 
Sep 15 2015-09-01  0.000000  0.000000 5191.197065 

如果你想reset_index和刪除列名中使用rename_axis(新中pandas0.18.0):

print (df.pivot_table(index=['monthyear','Date'], 
         columns='Facility', 
         values='Yield', 
         fill_value=0).reset_index().rename_axis(None, axis=1)) 

    monthyear  Date   CCM1   CCM2   CCM3 
0 Dec 15 2015-12-01 2550.000000  0.000000  0.000000 
1 Feb 16 2016-02-01 4250.000000  0.000000  0.000000 
2 Jan 16 2016-01-01  0.000000 1540.000000 6800.000000 
3 Nov 15 2015-11-01 921.458157 1750.310038  0.000000 
4 Sep 15 2015-09-01  0.000000  0.000000 5191.197065 

pivot_table使用aggfunc,默認爲aggfunc=np.mean,如果在列重複montyearDate。樣品更好的解釋是heredocs

+0

WOW ... @ jezrael非常感謝您的解答...您拯救了我的一天.. –

相關問題