這是一些代碼,我在別處找到了修改後的版本(這裏大概在計算器上,其實...) - 我已經提取的手柄向後讀取兩個關鍵方法。
reversed_blocks
迭代器以您喜歡的大小塊向後讀取文件,reversed_lines
迭代器將塊拆分爲行,保存第一個塊;如果下一個塊以換行符結束,則將其作爲完整行返回,如果不是,則將已保存的部分行追加到新塊的最後一行,從而完成在塊邊界上拆分的行。
所有的狀態都由Python的迭代器機制來維護,所以我們不必在任何地方存儲狀態;這也意味着如果需要的話,可以一次向後讀取多個文件,因爲狀態綁定到迭代器。
def reversed_lines(self, file):
"Generate the lines of file in reverse order."
newline_char_set = set(['\r', '\n'])
tail = ""
for block in self.reversed_blocks(file):
if block is not None and len(block)>0:
# First split the whole block into lines and reverse the list
reversed_lines = block.splitlines()
reversed_lines.reverse()
# If the last char of the block is not a newline, then the last line
# crosses a block boundary, and the tail (possible partial line from
# the previous block) should be added to it.
if block[-1] not in newline_char_set:
reversed_lines[0] = reversed_lines[0] + tail
# Otherwise, the block ended on a line boundary, and the tail is a
# complete line itself.
elif len(tail)>0:
reversed_lines.insert(0,tail)
# Within the current block, we can't tell if the first line is complete
# or not, so we extract it and save it for the next go-round with a new
# block. We yield instead of returning so all the internal state of this
# iteration is preserved (how many lines returned, current tail, etc.).
tail = reversed_lines.pop()
for reversed_line in reversed_lines:
yield reversed_line
# We're out of blocks now; if there's a tail left over from the last block we read,
# it's the very first line in the file. Yield that and we're done.
if len(tail)>0:
yield tail
def reversed_blocks(self, file, blocksize=4096):
"Generate blocks of file's contents in reverse order."
# Jump to the end of the file, and save the file offset.
file.seek(0, os.SEEK_END)
here = file.tell()
# When the file offset reaches zero, we've read the whole file.
while 0 < here:
# Compute how far back we can step; either there's at least one
# full block left, or we've gotten close enough to the start that
# we'll read the whole file.
delta = min(blocksize, here)
# Back up to there and read the block; we yield it so that the
# variable containing the file offset is retained.
file.seek(here - delta, os.SEEK_SET)
yield file.read(delta)
# Move the pointer back by the amount we just handed out. If we've
# read the last block, "here" will now be zero.
here -= delta
reversed_lines
是一個迭代器,讓你在一個循環中運行它:
for line in self.reversed_lines(fh):
do_something_with_the_line(line)
的意見可能是多餘的,但在我工作了迭代器如何做他們的工作,他們對我很有用。
您預計會有多少空行?成千上萬的?每一個可能只是一個單行換行符,所以我認爲即使是64k字節也可能會過度殺傷。 – Blckknght 2014-09-22 08:51:46
它可能是,但與將所有內容全部讀入內存相比,它仍然是一個非常激烈的優化。 – sashoalm 2014-09-22 08:53:09
有沒有內置的功能來做到這一點,但我不得不爲此編寫一個類。我會看看我能否獲得發佈權限。 – 2014-09-22 08:59:19