2014-11-14 65 views
0

我有以下代碼來比較兩個文件。如果我將它們指向4或5 MB大的文件,我希望這個程序運行。當我這樣做時,python控制檯中的提示光標就會閃爍,並且不顯示輸出。有一次,我跑了整整一晚,第二天早上它仍在閃爍。我可以在這段代碼中改變什麼?Python程序比較兩個文件以顯示差異

import difflib 

file1 = open('/home/michel/Documents/first.csv', 'r') 
file2 = open('/home/michel/Documents/second.csv', 'r') 

diff = difflib.ndiff(file1.readlines(), file2.readlines()) 
delta = ''.join(diff) 
print delta 
+0

您是否檢查過CPU使用率?它是100%嗎? – vz0 2014-11-14 15:20:53

+0

它可能是重複的http://stackoverflow.com/questions/4899146/diff-two-big-files-in-python – umut 2014-11-14 15:25:34

+0

我不喜歡他們的方式,該解決方案顯示結果。我喜歡使用context_diff或ndiff。 :( – MiniGunnR 2014-11-14 15:47:57

回答

0

如果你使用基於linux的系統,你可以調用外部命令差異,你可以使用它的結果。我用diff命令嘗試兩個文件14M和9.3M。它需要1.3秒。

real 0m1.295s 
user 0m0.056s 
sys  0m0.192s 
0

當我試圖在你的方式,我有同樣的問題使用difflib,因爲對於大文件difflib緩存在內存中的整個文件,然後對它們進行比較。作爲解決方案,您可以部分比較兩個文件。我在這裏每100行就做一次。

import difflib 

file1 = open('1.csv', 'r') 
file2 = open('2.csv', 'r') 

lines_file1 = [] 
lines_file2 = [] 

# i: number of line 
# line: content of line 
for i, line in enumerate(zip(file1, file2)): 
    # check if it is in line 100 
    if not (i % 100 == 0): 
     lines_file1.append(line[0]) 
     lines_file2.append(line[1]) 
    else: 
     # show the different for 100 line 
     diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2)) 
     print ''.join(list(diff)) 
     lines_file1 = [] 
     lines_file2 = [] 

# show the different if any lines left 
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2)) 
print ''.join(list(diff)) 
file1.close() 
file2.close() 

希望它有幫助。