2016-10-16 24 views
2

我想「合併連接」兩個熊貓數據框。基本上,我想疊加兩個DataFrame,但只保留每個DataFrame中與其他DataFrame中的值匹配的行。因此,例如:連接pandas DataFrames只保留列中具有匹配值的行嗎?

data1: 

+---+------------+-----------+-------+ 
| | first_name | last_name | class | 
+---+------------+-----------+-------+ 
| 0 | Alex  | Anderson |  1 | 
| 1 | Amy  | Ackerman |  2 | 
| 2 | Allen  | Ali  |  3 | 
| 3 | Alice  | Aoni  |  4 | 
| 4 | Andrew  | Andrews |  4 | 
| 5 | Ayoung  | Atiches |  5 | 
+---+------------+-----------+-------+ 

data2: 

+---+------------+-----------+-------+ 
| | first_name | last_name | class | 
+---+------------+-----------+-------+ 
| 0 | Billy  | Bonder |  4 | 
| 1 | Brian  | Black  |  5 | 
| 2 | Bran  | Balwner |  6 | 
| 3 | Bryce  | Brice  |  7 | 
| 4 | Betty  | Btisan |  8 | 
| 5 | Bruce  | Bronson |  8 | 
+---+------------+-----------+-------+ 

然後在data1data2執行此操作後所產生的數據幀應該是這樣的:

result: 

+---+------------+-----------+-------+ 
| | first_name | last_name | class | 
+---+------------+-----------+-------+ 
| 3 | Alice  | Aoni  |  4 | 
| 4 | Andrew  | Andrews |  4 | 
| 5 | Ayoung  | Atiches |  5 | 
| 0 | Billy  | Bonder |  4 | 
| 1 | Brian  | Black  |  5 | 
+---+------------+-----------+-------+ 

基本上,我試圖合併這兩個數據集,然後堆積列。我可以想到一些方法來做到這一點,但他們都是黑客。我可以合併data1data2,然後疊加起來的列,或使用地圖,如:

map1 = data1['subject_id'].map(lambda x: x in list(data2['subject_id'])) 
map2 = data2['subject_id'].map(lambda x: x in list(data1['subject_id'])) 
pd.concat([data1[map1], data2[map2]]) 

但有一個更優雅的解決方案呢?

回答

1

這個怎麼樣?

In [335]: cls = np.intersect1d(data1['class'], data2['class']) 

In [336]: cls 
Out[336]: array([4, 5], dtype=int64) 

In [337]: pd.concat([data1.ix[data1['class'].isin(cls)], data2.ix[data2['class'].isin(cls)]]) 
Out[337]: 
    first_name last_name class 
3  Alice  Aoni  4 
4  Andrew Andrews  4 
5  Ayoung Atiches  5 
0  Billy Bonder  4 
1  Brian  Black  5 

或:

In [338]: data1.ix[data1['class'].isin(cls)].append(data2.ix[data2['class'].isin(cls)]) 
Out[338]: 
    first_name last_name class 
3  Alice  Aoni  4 
4  Andrew Andrews  4 
5  Ayoung Atiches  5 
0  Billy Bonder  4 
1  Brian  Black  5 
相關問題