2017-07-16 86 views
2

我有三個dataframes:大熊貓聯接複製錯誤

maindf = pd.DataFrame({'Risk':['AB','AC','AD'],'amnt':[100,200,300]}) 

maindf 
Out[4]: 
    Risk amnt 
0 AB 100 
1 AC 200 
2 AD 300 

disc = pd.DataFrame({'Risk':['AB','AB','AB','AC','AC','AD'], 'discPerc':[0.4,0.5,0.1,0.5,0.5,1]}) 

disc 
Out[7]: 
    Risk discPerc 
0 AB  0.4 
1 AB  0.5 
2 AB  0.1 
3 AC  0.5 
4 AC  0.5 
5 AD  1.0 

ops = pd.DataFrame({'Risk':['AB','AB','AC','AC','AD','AD'], 'opsPerc':[0.5,0.5,0.4,0.6,0.2,0.8]}) 

ops 
Out[9]: 
    Risk opsPerc 
0 AB  0.5 
1 AB  0.5 
2 AC  0.4 
3 AC  0.6 
4 AD  0.2 
5 AD  0.8 

我想加盟dataframes一起到maindf所以,如果我以往任何時候都需要GROUPBY列「風險」我將得到discPerc和opsPerc總和爲1(因爲它們是在光盤/ OPS數據幀)

一個簡單的雙左連接的結果:

merged = pd.merge(maindf,disc,on='Risk',how='left') 

merged = pd.merge(merged,ops, on = 'Risk', how = 'left') 

merged 
Out[19]: 
    Risk amnt discPerc opsPerc 
0 AB 100  0.4  0.5 
1 AB 100  0.4  0.5 
2 AB 100  0.5  0.5 
3 AB 100  0.5  0.5 
4 AB 100  0.1  0.5 
5 AB 100  0.1  0.5 
6 AC 200  0.5  0.4 
7 AC 200  0.5  0.6 
8 AC 200  0.5  0.4 
9 AC 200  0.5  0.6 
10 AD 300  1.0  0.2 
11 AD 300  1.0  0.8 

和分組在此給出:

merged.groupby('Risk').sum() 
Out[20]: 
     amnt discPerc opsPerc 
Risk       
AB  600  2.0  3.0 
AC  800  2.0  2.0 
AD  600  2.0  1.0 

相反,我想數據幀合併看起來像:

Risk amnt discPerc opsPerc 
0 AB 100  0.4  nan 
1 AB 100  0.5  nan 
2 AB 100  0.1  nan 
3 AB 100  nan  0.5 
4 AB 100  nan  0.5 
6 AC 200  0.5  nan 
7 AC 200  0.5  nan 
8 AC 200  nan  0.4 
9 AC 200  nan  0.6 
10 AD 300  1.0  nan 
11 AD 300  nan  0.2 
12 AD 300  nan  0.8 

這樣我可以總結回去拿百分比爲1

回答

5

可以Concat的discops然後合併原始數據幀:

pd.concat((disc, ops)).merge(maindf) 
Out: 
    Risk discPerc opsPerc amnt 
0 AB  0.4  NaN 100 
1 AB  0.5  NaN 100 
2 AB  0.1  NaN 100 
3 AB  NaN  0.5 100 
4 AB  NaN  0.5 100 
5 AC  0.5  NaN 200 
6 AC  0.5  NaN 200 
7 AC  NaN  0.4 200 
8 AC  NaN  0.6 200 
9 AD  1.0  NaN 300 
10 AD  NaN  0.2 300 
11 AD  NaN  0.8 300 
+1

不錯的解決方案+1 – Wen

+0

謝謝@Wen。 :) – ayhan

+1

@ayhan他認爲這是如此的好,他給了你一個plusTwo – piRSquared