我正在處理兩個非常相似的數據框,我試圖弄清楚如何獲取數據在一個而不是另一個 - 反之亦然。Python如何獲得在一個數據幀中,但不是第二個的值
這是到目前爲止我的代碼:
import pandas as pd
import numpy as np
def report_diff(x):
return x[0] if x[0] == x[1] else '{} ---> {}'.format(*x)
old = pd.read_excel('File 1')
new = pd.read_excel('File 2')
old['version'] = 'old'
new['version'] = 'new'
full_set = pd.concat([old,new],ignore_index=True)
changes = full_set.drop_duplicates(subset=['ID','Type', 'Total'], keep='last')
duplicated = changes.duplicated(subset=['ID', 'Type'], keep=False)
dupe_accts = changes[duplicated]
change_new = dupe_accts[(dupe_accts['version'] == 'new')]
change_old = dupe_accts[(dupe_accts['version'] == 'old')]
change_new = change_new.drop(['version'], axis=1)
change_old = change_old.drop(['version'],axis=1)
change_new.set_index('Employee ID', inplace=True)
change_old.set_index('Employee ID', inplace=True)
diff_panel = pd.Panel(dict(df1=change_old,df2=change_new))
diff_output = diff_panel.apply(report_diff, axis=0)
因此,下一步將是獲取只在老只在新的數據。
我第一次嘗試是:
changes['duplicate']=changes['Employee ID'].isin(dupe_accts)
removed_accounts = changes[(changes['duplicate'] == False) & (changes['version'] =='old')]
修正了'新= pd.read_excel('文件2)'報價。 –
http://stackoverflow.com/questions/20225110/comparing-two-dataframes-and-getting-the-differences – Dadep