蟒蛇同時分析兩個大文件逐行

我想分析兩個±6 GB的文件。我需要同時分析它們，因爲我需要兩行同時（每個文件一個）。我試圖做這樣的事情：蟒蛇同時分析兩個大文件逐行

with open(fileOne, "r") as First_file: 
    for index, line in enumerate(First_file): 

     # Do some stuff here 

    with open(fileTwo, "r") as Second_file: 
     for index, line in enumerate(Second_file): 

      # Do stuff here aswell

的問題是，在第二個「開放的」循環開始於文件的開頭。所以，分析將花費很長時間。我也試過這個：

with open(fileOne, "r") as f1, open(fileTwo, "r") as f2: 
    for index, (line_R1, line_R2) in enumerate(zip(f1, f2)):

問題是兩個文件都直接加載到內存中。我需要從每個文件相同的行。正確的行是：

number_line%4 == 1

這將給第2，5，9，13等。我需要這兩個文件中的這些行。

有沒有更快的方式和更有效的內存方式來做到這一點？

來源

2014-05-14 TheBumpper

Python 2，對吧？ –

是的，我在python2.7中編程 – TheBumpper

只要將它扔到那裏以防萬一您的用例有用：https：//docs.python.org/2/library/difflib.html – netcoder

在Python 2，使用itertools.izip()防止文件被加載到內存：

from itertools import izip 

with open(fileOne, "r") as f1, open(fileTwo, "r") as f2: 
    for index, (line_R1, line_R2) in enumerate(izip(f1, f2)):

內置zip()功能確實會閱讀這兩個文件對象到其全部記憶，izip()檢索線一個在時間。

來源

2014-05-14 12:46:30

yeeey它的工作原理！非常感謝你！ – TheBumpper

而在Python 3中，你可以直接壓縮（你的第二次嘗試可以工作）。 – jsbueno

蟒蛇同時分析兩個大文件逐行

回答

相關問題