如何有效地跳過python文件中的前n行？

我目前正在使用Python封裝器的C++腳本來逐行處理更大的（15 GB）文本文件。實際上，它從input.txt中讀取一行，對其進行處理，將結果輸出到output.txt。我在這裏使用straigtforward環路（INP被打開爲input.txt中，從被打開的output.txt的）：如何有效地跳過python文件中的前n行？

for line in inp: 
    result = operate(line) 
    out.write(result)

然而，由於C++腳本的問題，它有一定的失敗率，這將導致循環在大約一千萬次迭代之後關閉。這給我留下了一個輸出文件，只使用了10％的輸入。

由於我無法修復原始腳本，所以我想在剛剛停止的地方重新啓動它。我算output.txt中的線，做成一種叫output2.txt，並開始了下面的代碼：

k = 0 
for line in inp: 
    if k < 12123253: 
     k + = 1 
    else: 
     result = operate(line) 
     out2.write(result) 
     k + = 1

不過，比起當我的詩行，一分鐘之內結束計數到，這種方法需要較長几小時到達指定的線路。

爲什麼這種方法效率低下？有更快的嗎？我在一臺具有強大計算能力（72GB內存，良好處理器）的Windows PC上，並使用Python 2.7。

來源

2016-04-13 Ezze

我想告訴（記錄你在哪裏）並尋求（在你的下一次運行中返回該點）可能可以幫助你。 http://stackoverflow.com/questions/3299213/python-how-can-i-open-a-file-and-specify-the-offset-in-bytes –

我建議你使用itertools

with open(inp) as f: 
    result = itertools.islice(f, start_line, None) 
    for i in result: 
     #do something with this line

來源

2016-04-13 08:39:38 Francesco

這仍然會讀取整個文件直到利益。 –

你可以使用file.seek和file.tell。下面是樣本（僞）代碼：

def seralizebreakpoint(pos): 
    pass 

def desearializebreakpoint(): 
    '''return -1 if there is actually no break point''' 
    pass 

def process(inp): 

    pos = inp.tell() 
    for line in inp: 
     try: 
      result = operate(line) 
      pos = inp.tell()    
     except: 
      seralizebreakpoint(pos) 
      raise 

def processEntry(pathtoinput): 

    bp = desearializebreakpoint() 
    with open(pathtoinput, 'r') as inp: 
     if bp > -1: 
      inp.seek(bp) 
     process(inp)

來源

2016-04-13 08:56:17

如何有效地跳過python文件中的前n行？

回答

相關問題