2014-02-12 24 views
0

我想比較兩個csv文件,並期待此輸出但無法成功。這是我的榜樣,代碼:如何正確循環兩個文件,比較兩個文件中的字符串

File1.csv

meNOG00110,9606.ENSP00000349259,1,2364 

meNOG06332,9606.ENSP00000344967,1,322 

meNOG06773,9606.ENSP00000344961,1,379 

meNOG03133,9606.ENSP00000387429,1,2089 

meNOG17468,9606.ENSP00000217169,1,298 

File2.csv

meNOG06332,9606.ENSP00000344967,1,322 

meNOG00110,9606.ENSP00000349259,1,2364 

meNOG00110,9606.ENSP00000357130,1,2419 

meNOG00018,10090.ENSMUSP00000027367,1,261 

meNOG00018,10090.ENSMUSP00000072852,1,276 

output.txt的

meNOG06332 9606.ENSP00000344967 1 322 

meNOG00110 9606.ENSP00000349259 1 2364 

meNOG00018 10090.ENSMUSP00000027367 1 261 

meNOG00018 10090.ENSMUSP00000072852 1 276 

代碼:

file1 = open("File1.csv", "rU") 
reader1 = csv.reader(file1,delimiter=',') 

file2 = open("File2.csv", "rU") 
reader2 = csv.reader(file2,delimiter=',') 

for row2 in reader2: 
    for row1 in reader1: 
     if row2[1].startswith('9606'): 
      if row2[1] == row1[1]: 
       print row2    
     else: 
      print row2 

但是這段代碼只搜索第一行。

+1

@ g.d.d.c這不是OP所期待的。他正在尋找所有N^2比較。 – wflynny

+0

它看起來像他的代碼只是打印'row2',不管發生了什麼。這可能不是這個意圖... – colcarroll

+0

要這樣做,你想在第一個for循環中移動file1的開頭。當循環首次通過file1迭代時,它不會返回到文件的開始處以供後續循環使用。 – Neil

回答

0

我不知道這正是你要尋找的,但由於說不清楚:

,如果你正在尋找這兩個文件之間的重疊,並要比較整行,你可以讓兩套(每個文件)和輸出的交集:

with open('File1.csv', 'r') as infile1, 
    open('File2.csv', 'r') as infile2, 
    open('File3.csv', 'w') as outfile: 
    lines1 = set(infile1) 
    lines2 = set(infile2) 

    writer = csv.writer(outfile, delimiter=',') 
    for line in (lines1 & lines2): 
     writer.writerow(line) 
0

你可以在兩個文件壓縮在一起:

with open(path_a, 'r') as a, open(path_b, 'r') as b: 
    for line_a, line_b in zip(a, b): 
     print line_a, line_b 

如果第一個文件是:

a 
s 
d 
f 

和第二個文件是:

q 
w 
e 
r 

輸出將是:

a q 
s w 
d e 
f r