儘管循環文件來提取行

我對python非常陌生。我試圖提取特定的行，跳過標題行，這些標題行在文本文件中以定期間隔重複，並將它們寫入另一個文件中。我已經能夠用下面的代碼做到這一點，但這非常緩慢。儘管循環文件來提取行

import random 
import sys 
import os 

with open('test.txt', encoding ='latin1') as rf: 
    with open('test1.txt', 'w') as wf: 
     for x, line in enumerate(rf): #reads the line number 
      #nskip = 3 #number of headers to skip 
      #nloop = 5 #number of loops in the file 
      ndata = 7 #number of lines in each loop 
      data = 4 #number of lines to be extracted 
      x+=1 
      #print(x,line) 

      for i in range(1,ndata+1): 
       for j in range((ndata*i - data)+1, ndata*i+1): 
        if x == j: 
         #print(line) 
         wf.write(line)

例如，從這段代碼中，我可以得到Line5，Line6，Line7，Line12，Line13，Line14，Line19，Line20，Line21（如果你認爲測試文件中有Line1，Line2，Line3等每行）。但問題是我真正的文件更大，需要大量的時間和記憶。這樣做肯定會有一種更快捷的pythonic方式。

另外，我希望能夠在每個循環中添加循環數，即第1個循環將在所有行中獲得1（每行中的某處可能是Line5 1，Line6 1，Line7 1，Line12 2，Line13 2，Line14 2，Line19 3等）。儘管我想要做的比這個有些複雜。但是這應該通過我的方式鋪平道路。謝謝。

來源

2016-11-16 Luck4u

由於標題和記錄的大小固定，因此跳過數字標題行並重複寫入記錄行數直到達到文件末尾。

n_header_lines = 25 
n_record_lines = 100 
page_num = 0 

with open('test.txt', encoding ='latin1') as rf, with open('test1.txt', 'w') as wf: 
    try: 
     while True: 
      page_num += 1 
      for _ in range(n_header_lines): 
       next(rf) 
      for line_num in range(1, n_record_lines + 1): 
       prefix = 'Line {:3d} {:3d} '.format(line_num, page_num) 
       wf.write(prefix + next(rf))) 
    except StopIteration: 
     pass

來源

2016-11-16 18:50:24

試過這與n_header_lines = 4和n_record_lines = 3. 7行中的每個循環中的測試文件：該出放是線5，7，9，15，17，19，25，27，29等等......我希望是5,6,7,12,13,14,19,20,21等。所以代碼沒有那麼有用。還有什麼建議？感謝這篇文章。 – Luck4u

@ Luck4u：這個每隔一行的跳過都會表明你在調用'next（rf）'之外的'rf'上迭代。確保你沒有任何代碼，比如'for line in rf：'。還要記住每次調用'next（rf）'都會佔用一行。如果你在for循環中這樣做了兩次，它將以預期的兩倍消耗。我想說的是，這個概念是合理的，你的實現是缺乏的。如果您想讓我查看它，請將它添加到您的問題中。 –

剛剛編輯的問題。希望它現在更有意義。謝謝！ – Luck4u

儘管循環文件來提取行

回答

相關問題