2016-12-17 13 views
1

我有一個數據集(nba_data),我遇到了轉置問題。我想是把以下,Python - 熊貓轉置遊戲日誌數據

TEAM_ABBREVIATION GAME_DATE WinLoss HomeAway 
ATL     2016-10-27 W    H 
ATL     2016-10-29 W    A 
ATL     2016-10-31 W    H 
ATL     2016-11-02 L    H 
BKN     2016-10-26 L    A 
BKN     2016-10-28 W    H 
BKN     2016-10-29 L    A 
BKN     2016-10-31 L    H 

以下內容,

TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss 
ATL    2016-10-27  1  0   0  0 
ATL    2016-10-29  1  0   1  0 
ATL    2016-10-31  2  0   1  0 
ATL    2016-11-02  2  1   1  0 
BKN    2016-10-26  0  0   0  1 
BKN    2016-10-28  1  0   0  1 
BKN    2016-10-29  1  0   0  2 
BKN    2016-10-31  1  1   0  2 

如果你能請幫助將是巨大的。

謝謝, 湯姆

回答

3
import pandas as pd 

df = pd.DataFrame({'GAME_DATE': ['2016-10-27', '2016-10-29', '2016-10-31', '2016-11-02', '2016-10-26', '2016-10-28', '2016-10-29', '2016-10-31'], 'HomeAway': ['H', 'A', 'H', 'H', 'A', 'H', 'A', 'H'], 'TEAM_ABBREVIATION': ['ATL', 'ATL', 'ATL', 'ATL', 'BKN', 'BKN', 'BKN', 'BKN'], 'WinLoss': ['W', 'W', 'W', 'L', 'L', 'W', 'L', 'L']}) 

result = pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int') 
result = result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum') 
result = result.sort_index(axis='columns', ascending=False) 
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
           'HL':'HomeLoss', 'HW':'HomeWin'}) 
result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns') 

產生

TEAM_ABBREVIATION GAME_DATE HomeWin HomeLoss AwayWin AwayLoss 
0    ATL 2016-10-27  1   0  0   0 
1    ATL 2016-10-29  1   0  1   0 
2    ATL 2016-10-31  2   0  1   0 
3    ATL 2016-11-02  2   1  1   0 
4    BKN 2016-10-26  0   0  0   1 
5    BKN 2016-10-28  1   0  0   1 
6    BKN 2016-10-29  1   0  0   2 
7    BKN 2016-10-31  1   1  0   2 

第一個想法是,有4種 「事件」 從WinLoss對應於可能值的4個組合的和​​列:(W,H),(W,A)(L,H)(L,A)

因此很自然地想將WinLoss和​​列合併成一列:

In [111]: df['HomeAway'] + df['WinLoss'] 
Out[111]: 
0 HW 
1 AW 
2 HW 
3 HL 
4 AL 
5 HW 
6 AL 
7 HL 
dtype: object 

,然後用get_dummies這一系列轉換成1的的表和0:

In [112]: pd.get_dummies(df['HomeAway'] + df['WinLoss']).astype('int') 
Out[112]: 
    AL AW HL HW 
0 0 0 0 1 
1 0 1 0 0 
2 0 0 0 1 
3 0 0 1 0 
4 1 0 0 0 
5 0 0 0 1 
6 1 0 0 0 
7 0 0 1 0 

現在通過與您期望的結果進行比較,我們可以看到我們也想要累計總和,按TEAM_ABBREVIATION分組:

In [114]: result.groupby(df['TEAM_ABBREVIATION']).transform('cumsum') 
Out[114]: 
    AL AW HL HW 
0 0 0 0 1 
1 0 1 0 1 
2 0 1 0 2 
3 0 1 1 2 
4 1 0 0 0 
5 1 0 0 1 
6 2 0 0 1 
7 2 0 1 1 

接下來的兩行重新排序和重命名列:

result = result.sort_index(axis='columns', ascending=False) 
result = result.rename(columns={'AL':'AwayLoss', 'AW':'AwayWin', 
           'HL':'HomeLoss', 'HW':'HomeWin'}) 

最後,我們可以使用pd.concat來連接dfresult和建設所需的數據框:

result = pd.concat([df[['TEAM_ABBREVIATION', 'GAME_DATE']], result], axis='columns') 
+0

這個'get_dummies'方法很好! –