2015-08-14 109 views
1

我試圖用熊貓來解決以下問題:熊貓:數據幀差函數

數據幀1:

Apple Banana Orange 
Orange Banana Apple 
Kiwi Lime Apple 
Banana Apple Orange 

數據幀2:

Orange Banana Apple 
Apple Banana Orange 
Apple Orange Apple 
Kiwi Apple Apple 

功能:

DataFrame 1 - DataFrame 2 

輸出:

Kiwi Lime Apple 
Banana Apple Orange 

從本質上說,我負責的多列分類變量,想找到在數據幀1行,但不能在數據幀2.我也想保持行有序,如產出中所示。即不是這樣的:

Banana Apple Orange 
Kiwi Lime Apple 

回答

1

考慮使用pandas.merge,然後刪除任何生成的連接。

#!/usr/bin/python 
import pandas as pd 

df1 = pd.DataFrame({'Categ1':['Apple', 'Orange', 'Kiwi', 'Banana'], 
        'Categ2':['Banana', 'Banana', 'Lime', 'Apple'], 
        'Categ3':['Orange', 'Apple', 'Apple', 'Orange']}) 

df2 = pd.DataFrame({'Categ1':['Orange', 'Apple', 'Apple', 'Kiwi'], 
        'Categ2':['Banana', 'Banana', 'Orange', 'Apple'], 
        'Categ3':['Apple', 'Orange', 'Apple', 'Apple']}) 

# MERGE BOTH DATA FRAMES 
merged = pd.merge(df1, df2, on=['Categ1', 'Categ2', 'Categ3']) 

# DROP FROM ORIGINAL DF1 ANY ITEMS IN MERGED 
df1 = df1.drop(merged.index.get_values()) 

數據幀輸出:

ORIGINAL DF1 
    Categ1 Categ2 Categ3 
0 Apple Banana Orange 
1 Orange Banana Apple 
2 Kiwi Lime Apple 
3 Banana Apple Orange 

MERGED DF 
    Categ1 Categ2 Categ3 
0 Apple Banana Orange 
1 Orange Banana Apple 

FINAL DF1 
    Categ1 Categ2 Categ3 
2 Kiwi Lime Apple 
3 Banana Apple Orange