2017-02-23 108 views
3

我有兩個csv文件都由兩列組成。將兩個csv文件與python熊貓比較

第一個有產品ID,第二個有序列號。

我需要從第一個csv查找所有序列號,並在第二個csv上查找匹配。結果報告將具有匹配的序列號以及來自每個csv的相應產品ID,在單獨的列 中修改以下代碼,沒有運氣。

你會如何處理這個問題?

import pandas as pd 
    A=set(pd.read_csv("c1.csv", index_col=False, header=None)[0]) #reads the csv, takes only the first column and creates a set out of it. 
    B=set(pd.read_csv("c2.csv", index_col=False, header=None)[0]) #same here 
    print(A-B) #set A - set B gives back everything thats only in A. 
    print(B-A) # same here, other way around. 
+0

你可以添加一些樣本數據和期望的輸出?因爲它有點不清楚究竟需要什麼。 – jezrael

回答

4

我想你需要merge

A = pd.DataFrame({'product id': [1455,5452,3775], 
        'serial number':[44,55,66]}) 

print (A) 

B = pd.DataFrame({'product id': [7000,2000,1000], 
        'serial number':[44,55,77]}) 

print (B) 

print (pd.merge(A, B, on='serial number')) 
    product id_x serial number product id_y 
0   1455    44   7000 
1   5452    55   2000 
+0

只需要一個小小的修改,在上面的代碼片段中,怎麼能給出兩個文件名作爲輸入,而不是硬編碼值呢? – poyim

-1
first_one=pd.read_csv(file_path) 
//same way for second_one 
// if product_id is the first column then its location would be at '0' 
len_=len(first_one) 
i=0 
while(len_!=0) 
{ 
if(first_one[i]==second_one[i]) 
{ 
//it is a match do whatever you want with this matched data 
i=i-1; 
} 
len_=len_-1; 
} 
3

試試這個:

A = pd.read_csv("c1.csv", header=None, usecols=[0], names=['col']).drop_duplicates() 
B = pd.read_csv("c2.csv", header=None, usecols=[0], names=['col']).drop_duplicates() 
# A - B 
pd.merge(A, B, on='col', how='left', indicator=True).query("_merge == 'left_only'") 
# B - A 
pd.merge(A, B, on='col', how='right', indicator=True).query("_merge == 'right_only'") 
0

可以DF轉換成集,而比較數據,將忽略指數,然後用set symmetric_difference

ds1 = set([ tuple(values) for values in df1.values.tolist()]) 
ds2 = set([ tuple(values) for values in df2.values.tolist()]) 

ds1.symmetric_difference(ds2) 
print df1 ,'\n\n' 
print df2,'\n\n' 

print pd.DataFrame(list(ds1.difference(ds2))),'\n\n' 
print pd.DataFrame(list(ds2.difference(ds1))),'\n\n' 

DF1

id Name score isEnrolled    Comment 
0 111 Jack 2.17  True He was late to class 
1 112 Nick 1.11  False    Graduated 
2 113 Zoe 4.12  True     NaN 

DF2

id Name score isEnrolled    Comment 
0 111 Jack 2.17  True He was late to class 
1 112 Nick 1.21  False    Graduated 
2 113 Zoe 4.12  False   On vacation 

輸出

 0  1  2  3   4 
0 113 Zoe 4.12 True  NaN 
1 112 Nick 1.11 False Graduated 


    0  1  2  3   4 
0 113 Zoe 4.12 False On vacation 
1 112 Nick 1.21 False Graduated