2017-08-31 90 views
0

所以我正在學習熊貓,我在合併兩個數據框時遇到了麻煩。可能它更像是一個格式問題,但即使經過大量的嘗試性研究,我也沒有得到它。在熊貓中合併和格式化

假設我們有兩個數學輔導課,我們想知道哪些是同時參加的學生。

數據幀A.

Id Subject Students_A 1 Maths Ron 2 Maths Harry 3 Maths Hermionie 4 Maths Draco

數據幀B.

Id Subject Students_B 1 Maths Harry 2 Maths Draco 3 Maths Neville

現在我在jupyter筆記本這樣做:

df_common = pd.merge(df_A,df_B,left_on='studentA', right_on='studentB', how='outer')

而得到這個:

Id Subject_x StudentA Subject_y StudentB 1 Maths Ron Nan Nan 2 Maths Harry Maths Harry 3 Maths Hermionie Nan Nan 4 Maths Draco Maths Draco 5 Nan Nan Maths Neville

不過,我想是這樣的:?

Id Subject StudentA StudentB 1 Maths Ron Nan 2 Maths Harry Harry 3 Maths Hermionie Nan 4 Maths Draco Draco 5 Maths Nan Neville

什麼我做錯了,謝謝!

回答

1

Students AND Subject合併:

df1.merge(df2, how="outer", 
       left_on=["Subject","Students_A"], 
       right_on=["Subject","Students_B"]) 

    Subject Students_A Students_B 
0 Maths  Ron  NaN 
1 Maths  Harry  Harry 
2 Maths Hermionie  NaN 
3 Maths  Draco  Draco 
4 Maths  NaN Neville 

注意:假定Id可以用作索引,例如

df1 = pd.read_clipboard(index_col="Id") 

    Subject Students_A 
Id     
1 Maths  Ron 
2 Maths  Harry 
3 Maths Hermionie 
4 Maths  Draco 
0

試試這個合併後聲明:

df_common["Subject"] = df_common["Subject_x"].fillna(df_common["Subject_y"]) 
df_common = df_common.drop(["Subject_x", "Subject_y"], 1) 

因此,基本上,當你執行的加盟,主題被重新任命爲Subject_x和Subject_y這樣就可以區分它們。爲了加入這些列,創建一個名爲Subject的新列,它接受Subject_x中的非空值,並在其中接受來自Subject_y的值。然後滴Subject_x和Subject_y

+1

'df_common.drop([ 「Subject_x」, 「Subject_y」],軸= 1)' – Wen

+0

啊哈,爲什麼我沒有想到that.I的思想有某種方式,我們可以只是合併做聲明。 –