一行一行地讀取文件，但是反過來（最後一行先，然後是最後一行等）

我想從文件中刪除尾隨的空行（如果有的話）。目前我通過在內存中讀取它，刪除那裏的空白行，並覆蓋它。該文件很大，但是（30000多行和長行），這需要2-3秒。一行一行地讀取文件，但是反過來（最後一行先，然後是最後一行等）

所以我想逐行讀取文件，但是向後讀，直到我到達第一個非空行。也就是說，我從最後一行開始，然後是最後一行，等等，然後我會截斷它，而不是覆蓋它。

什麼是最好的方式讀取它反向？現在我正在考慮讀取64k的塊，然後以字符爲單位循環遍歷字符串，直到獲得一行，然後當我用完64k，讀取另一個64k並預先安裝它們，等等。

我假設沒有標準函數或庫以相反順序讀取？

來源

2014-09-22 sashoalm

您預計會有多少空行？成千上萬的？每一個可能只是一個單行換行符，所以我認爲即使是64k字節也可能會過度殺傷。 – Blckknght 2014-09-22 08:51:46

它可能是，但與將所有內容全部讀入內存相比，它仍然是一個非常激烈的優化。 – sashoalm 2014-09-22 08:53:09

有沒有內置的功能來做到這一點，但我不得不爲此編寫一個類。我會看看我能否獲得發佈權限。 – 2014-09-22 08:59:19

這是一些代碼，我在別處找到了修改後的版本（這裏大概在計算器上，其實...） - 我已經提取的手柄向後讀取兩個關鍵方法。

reversed_blocks迭代器以您喜歡的大小塊向後讀取文件，reversed_lines迭代器將塊拆分爲行，保存第一個塊;如果下一個塊以換行符結束，則將其作爲完整行返回，如果不是，則將已保存的部分行追加到新塊的最後一行，從而完成在塊邊界上拆分的行。

所有的狀態都由Python的迭代器機制來維護，所以我們不必在任何地方存儲狀態;這也意味着如果需要的話，可以一次向後讀取多個文件，因爲狀態綁定到迭代器。

def reversed_lines(self, file): 
    "Generate the lines of file in reverse order." 
    newline_char_set = set(['\r', '\n']) 
    tail = "" 
    for block in self.reversed_blocks(file): 
     if block is not None and len(block)>0: 
      # First split the whole block into lines and reverse the list 
      reversed_lines = block.splitlines() 
      reversed_lines.reverse() 

      # If the last char of the block is not a newline, then the last line 
      # crosses a block boundary, and the tail (possible partial line from 
      # the previous block) should be added to it. 
      if block[-1] not in newline_char_set: 
       reversed_lines[0] = reversed_lines[0] + tail 

      # Otherwise, the block ended on a line boundary, and the tail is a 
      # complete line itself. 
      elif len(tail)>0: 
       reversed_lines.insert(0,tail) 

      # Within the current block, we can't tell if the first line is complete 
      # or not, so we extract it and save it for the next go-round with a new 
      # block. We yield instead of returning so all the internal state of this 
      # iteration is preserved (how many lines returned, current tail, etc.). 
      tail = reversed_lines.pop() 

      for reversed_line in reversed_lines: 
       yield reversed_line 

    # We're out of blocks now; if there's a tail left over from the last block we read, 
    # it's the very first line in the file. Yield that and we're done. 
    if len(tail)>0: 
     yield tail 

def reversed_blocks(self, file, blocksize=4096): 
    "Generate blocks of file's contents in reverse order." 

    # Jump to the end of the file, and save the file offset. 
    file.seek(0, os.SEEK_END) 
    here = file.tell() 

    # When the file offset reaches zero, we've read the whole file. 
    while 0 < here: 
     # Compute how far back we can step; either there's at least one 
     # full block left, or we've gotten close enough to the start that 
     # we'll read the whole file. 
     delta = min(blocksize, here) 

     # Back up to there and read the block; we yield it so that the 
     # variable containing the file offset is retained. 
     file.seek(here - delta, os.SEEK_SET) 
     yield file.read(delta) 

     # Move the pointer back by the amount we just handed out. If we've 
     # read the last block, "here" will now be zero. 
     here -= delta

reversed_lines是一個迭代器，讓你在一個循環中運行它：

for line in self.reversed_lines(fh): 
    do_something_with_the_line(line)

的意見可能是多餘的，但在我工作了迭代器如何做他們的工作，他們對我很有用。

來源

2014-09-22 18:10:48

with open(filename) as f: 
    size = os.stat(filename).st_size 
    f.seek(size - 4096) 
    block = f.read(4096) 
    # Find amount to truncate 
    f.truncate(...)

來源

2014-09-22 09:01:19 filmor

順便說一句，你可以使用'f.seek（-4096，2）'。 – sashoalm 2014-09-22 09:02:07

所以你確實知道如何從最後讀取文件？或者我誤解了你的問題？你可以通過執行'4096 - len（block.rstrip（））'來輕鬆截取數據。 – filmor 2014-09-22 10:45:46

這給你反向的塊，但不是線。查看我在下面的基於迭代器的版本，尋找一個很好的技巧來跟蹤塊和行偏移量，因此您不必擔心自己維護它們。 – 2014-09-22 18:14:36

一行一行地讀取文件，但是反過來（最後一行先，然後是最後一行等）

回答

相關問題