Python差異並獲得新的部分

我的問題是非常基本的。Python差異並獲得新的部分

我需要diff來由許多線製成，只得到他們的新的部分變量。它是非常simplier理解的例子：

第一個變量：

你好

我

名

是

第二個變量：

名

是

彼得

和

我

上午

金髮

我需要提取：

彼得

和

我

上午

金髮

我需要做的是在大文件。我該怎麼做？

非常感謝。

來源

2013-06-01 Alberto

分割線和做差異 – JBernardo

是的，但如果我做了set（variable1）-set（variable2），我會得到'Hello'和'my'。我只想得到新的。無論如何感謝 – Alberto

嘗試set（variable2） - set（variable1）。 –

如果重複和順序並不重要，這是非常簡單的：

first = set(open('firstFile').readlines()) 
second = set(open('secondFile').readlines()) 

diff = second - first

如果輸出順序事項：

first = open('firstfile').readlines() 
second = open('secondFile').readlines() 

diff = [line for line in second if line not in first]

如果輸入順序的問題，那麼這個問題需要加以澄清。

如果文件比較大，以至於它們加載到內存中是一個壞主意，你可能需要做這樣的事情：

secondFile = open('secondFile') 
diffFile = open('diffFile') 

for secondLine in secondFile: 
    match = False 
    firstFile = open('firstFile') 
    for firstLine in firstFile: 
     if firstLine == secondLine: 
      match = True 
      break 
    firstfile.close() 
    if not match: 
     print >>diffFile, secondLine 

secondFile.close()

來源

2013-06-01 02:08:24

它的工作原理！謝謝 – Alberto

突出一個額外的問題。第一個參數取自一個文件，但第二個參數是緩衝區中的一個變量。對於第二個變量，我正在執行'second = data.split（「\ n」）'，但差異結果不正確。你知道爲什麼嗎？ – Alberto

按照有關這個問題的評論，我們可以做到這一點：

first = set(x.strip() for x in open("tmp1.txt").readlines()) 
second = set(x.strip() for x in open("tmp2.txt").readlines()) 
print second - first

但是，如果我們認真考慮「大」，在處理之前加載整個文件可能會使用比機器上可用的更多的內存。如果第一個文件，小到足以放入內存，第二次是沒有，你可以這樣做：

first = set(x.strip() for x in open("tmp1.txt").readlines()) 
for line in open("tmp2.txt").xreadlines(): 
    line = line.strip() 
    if line not in first: 
     print line

如果第一個文件太大，我想你需要求助於的數據庫。

來源

2013-06-01 02:10:14

它的工作原理！謝謝 – Alberto

Python差異並獲得新的部分

回答

相關問題