2016-02-19 34 views
2

我正在嘗試使某些數據與數據幀重疊。 這裏一個簡單的例子:熊貓:計算與pivot_table或交叉表的重疊

df=pd.DataFrame({ 
'player':['A', 'B', 'C', 'D', 'A', 'C', 'B'], 
'game':['gameA', 'gameB', 'gameC', 'gameC', 'gameB', 'gameD', 'gameA']}) 

DF:

game player 
0 gameA  A 
1 gameB  B 
2 gameC  C 
3 gameC  D 
4 gameB  A 
5 gameD  C 
6 gameA  B 

我想要做的就是計算這是在兩場比賽中每個組合的球員的數量。

例如,結果應該是這樣的:

game1 game2 overlap 
    gameA gameB  2 #Because there is 2 players who play at gameA and gameB 
    gameA gameC  0 
    gameA gameD  0 
    gameB gameA  2   
    gameB gameC  0 
    gameB gameD  0   
    ... 

我可以用dictionnary和一個foreach,但做到這一點是有一個簡單的方法與pivot_table或交叉表辦呢?

非常感謝。

回答

0

你可以使用pd.merge創建game_table

game_table = pd.merge(df, df, how='left', on=['player']) 
# game_x player game_y 
# 0 gameA  A gameA 
# 1 gameA  A gameB 
# 2 gameB  B gameB 
# 3 gameB  B gameA 
# 4 gameC  C gameC 
# 5 gameC  C gameD 
# 6 gameC  D gameC 
# 7 gameB  A gameA 
# 8 gameB  A gameB 
# 9 gameD  C gameC 
# 10 gameD  C gameD 
# 11 gameA  B gameB 
# 12 gameA  B gameA 

然後申請pd.crosstabgame_table

freq = pd.crosstab(game_table['game_x'], game_table['game_y']) 
# game_y gameA gameB gameC gameD 
# game_x        
# gameA  2  2  0  0 
# gameB  2  2  0  0 
# gameC  0  0  2  1 
# gameD  0  0  1  1 

stack其次reset_index重塑數據框成所需的形式:

result = freq.stack().reset_index() 

import pandas as pd 
df = pd.DataFrame(
    {'player':['A', 'B', 'C', 'D', 'A', 'C', 'B'], 
    'game':['gameA', 'gameB', 'gameC', 'gameC', 'gameB', 'gameD', 'gameA']}) 

game_table = pd.merge(df, df, how='left', on=['player']) 
freq = pd.crosstab(game_table['game_x'], game_table['game_y']) 
result = freq.stack() 
result.name = 'overlap' 
result = result.reset_index() 
mask = (result['game_x'] != result['game_y']) 
result = result.loc[mask] 
print(result) 

產生

game_x game_y overlap 
1 gameA gameB  2 # Because both A and B played in gameA and gameB 
2 gameA gameC  0 
3 gameA gameD  0 
4 gameB gameA  2 
6 gameB gameC  0 
7 gameB gameD  0 
8 gameC gameA  0 
9 gameC gameB  0 
11 gameC gameD  1 
12 gameD gameA  0 
13 gameD gameB  0 
14 gameD gameC  1 
+0

非常感謝。我錯過了freq.stack()部分。 – erwanlc