2017-07-11 52 views
2

我有一個融化的DataFrame我想旋轉但不能使用2列作爲索引。有多個索引的熊貓 - 拆散/數據透視

import pandas as pd 
df = pd.DataFrame({'A': {0: 'XYZ', 1: 'XYZ', 2: 'XYZ', 3: 'XYZ', 4: 'XYZ', 5: 'XYZ', 6: 'XYZ', 7: 'XYZ', 8: 'XYZ', 9: 'XYZ', 10: 'ABC', 11: 'ABC', 12: 'ABC', 13: 'ABC', 14: 'ABC', 15: 'ABC', 16: 'ABC', 17: 'ABC', 18: 'ABC', 19: 'ABC'}, 'B': {0: '01/01/2017', 1: '02/01/2017', 2: '03/01/2017', 3: '04/01/2017', 4: '05/01/2017', 5: '01/01/2017', 6: '02/01/2017', 7: '03/01/2017', 8: '04/01/2017', 9: '05/01/2017', 10: '01/01/2017', 11: '02/01/2017', 12: '03/01/2017', 13: '04/01/2017', 14: '05/01/2017', 15: '01/01/2017', 16: '02/01/2017', 17: '03/01/2017', 18: '04/01/2017', 19: '05/01/2017'}, 'C': {0: 'Price', 1: 'Price', 2: 'Price', 3: 'Price', 4: 'Price', 5: 'Trading', 6: 'Trading', 7: 'Trading', 8: 'Trading', 9: 'Trading', 10: 'Price', 11: 'Price', 12: 'Price', 13: 'Price', 14: 'Price', 15: 'Trading', 16: 'Trading', 17: 'Trading', 18: 'Trading', 19: 'Trading'}, 'D': {0: '100', 1: '101', 2: '102', 3: '103', 4: '104', 5: 'Yes', 6: 'Yes', 7: 'Yes', 8: 'Yes', 9: 'Yes', 10: '50', 11: nan, 12: '48', 13: '47', 14: '46', 15: 'Yes', 16: 'No', 17: 'Yes', 18: 'Yes', 19: 'Yes'}}) 

所以:

A B C D 
XYZ 01/01/2017 Price 100 
XYZ 02/01/2017 Price 101 
XYZ 03/01/2017 Price 102 
XYZ 04/01/2017 Price 103 
XYZ 05/01/2017 Price 104 
XYZ 01/01/2017 Trading Yes 
XYZ 02/01/2017 Trading Yes 
XYZ 03/01/2017 Trading Yes 
XYZ 04/01/2017 Trading Yes 
XYZ 05/01/2017 Trading Yes 
ABC 01/01/2017 Price 50 
ABC 02/01/2017 Price 
ABC 03/01/2017 Price 48 
ABC 04/01/2017 Price 47 
ABC 05/01/2017 Price 46 
ABC 01/01/2017 Trading Yes 
ABC 02/01/2017 Trading No 
ABC 03/01/2017 Trading Yes 
ABC 04/01/2017 Trading Yes 
ABC 05/01/2017 Trading Yes 

將成爲:

A B Trading Price 
ABC 01/01/2017 Yes 50 
    02/01/2017 No 
    03/01/2017 Yes 48 
    04/01/2017 Yes 47 
    05/01/2017 Yes 46 
XYZ 01/01/2017 Yes 100 
    02/01/2017 Yes 101 
    03/01/2017 Yes 102 
    04/01/2017 Yes 103 
    05/01/2017 Yes 104 

或:

ABC  XYZ 
    Trading Price Trading Price 
01/01/2017 Yes 50 Yes 100 
02/01/2017 No  Yes 101 
03/01/2017 Yes 48 Yes 102 
04/01/2017 Yes 47 Yes 103 
05/01/2017 Yes 46 Yes 104 

我認爲這可以簡單地用旋轉完成,但得到一個錯誤:

df.pivot(index=['A', 'B'], columns = ['C'], values = ['D']) 
Traceback (most recent call last): 

    File "<ipython-input-41-afcc34979ff8>", line 1, in <module> 
    df.pivot(index=['A', 'B'], columns = ['C'], values = ['D']) 

    File "C:\Miniconda\lib\site-packages\pandas\core\frame.py", line 3951, in pivot 
    return pivot(self, index=index, columns=columns, values=values) 

    File "C:\Miniconda\lib\site-packages\pandas\core\reshape\reshape.py", line 377, in pivot 
    index=MultiIndex.from_arrays([index, self[columns]])) 

    File "C:\Miniconda\lib\site-packages\pandas\core\series.py", line 248, in __init__ 
    raise_cast_failure=True) 

    File "C:\Miniconda\lib\site-packages\pandas\core\series.py", line 3027, in _sanitize_array 
    raise Exception('Data must be 1-dimensional') 

Exception: Data must be 1-dimensional 

在R這將很快完成收集/傳播。

謝謝!

回答

2

這就是你想要的嗎?

In [23]: df.pivot_table(index=['A','B'], columns='C', values='D', aggfunc='first') 
Out[23]: 
C    Price Trading 
A B 
ABC 01/01/2017 50  Yes 
    02/01/2017 NaN  No 
    03/01/2017 48  Yes 
    04/01/2017 47  Yes 
    05/01/2017 46  Yes 
XYZ 01/01/2017 100  Yes 
    02/01/2017 101  Yes 
    03/01/2017 102  Yes 
    04/01/2017 103  Yes 
    05/01/2017 104  Yes 
+0

非常感謝!我想我錯過了'aggfunc'的觀點,這是造成這些問題的原因。我用unstack()發佈了一個答案,但你的更優雅。 – Yona

+0

'df.pivot_table(index = ['B'],columns = ['A','C'],values ='D',aggfunc ='first')'將是我的另一個問題的答案。 – Yona

+0

@Yona,很高興我能幫到:) – MaxU

1

我發現下面是可能的:

df.set_index(['A', 'C', 'B']).unstack().T 
Out[59]: 
A    ABC   XYZ   
C   Price Trading Price Trading 
    B          
D 01/01/2017 50  Yes 100  Yes 
    02/01/2017 NaN  No 101  Yes 
    03/01/2017 48  Yes 102  Yes 
    04/01/2017 47  Yes 103  Yes 
    05/01/2017 46  Yes 104  Yes 

和:

df.set_index(['A', 'B', 'C']).unstack() 
Out[61]: 
        D   
C    Price Trading 
A B      
ABC 01/01/2017 50  Yes 
    02/01/2017 NaN  No 
    03/01/2017 48  Yes 
    04/01/2017 47  Yes 
    05/01/2017 46  Yes 
XYZ 01/01/2017 100  Yes 
    02/01/2017 101  Yes 
    03/01/2017 102  Yes 
    04/01/2017 103  Yes 
    05/01/2017 104  Yes 
+1

你可以用這個:'df.set_index(['A','B','C'])['D']。 ' –