2013-06-28 180 views
6

我需要獲取文件中前一行的值,並在迭代文件時將其與當前行進行比較。該文件是巨大的,所以我無法讀取它整個或隨機訪問行號linecache,因爲庫函數仍然將整個文件讀入內存。閱讀文件中的上一行python

編輯我很抱歉,我忘了提及我必須向後讀取文件。

EDIT2

我曾嘗試以下:

f = open("filename", "r") 
for line in reversed(f.readlines()): # this doesn't work because there are too many lines to read into memory 

line = linecache.getline("filename", num_line) # this also doesn't work due to the same problem above. 
+1

你的意思是就在前面的那一行?你不能隨便保存它嗎? –

+2

如果您向我們展示了迄今爲止所寫的內容,您將更有可能獲得幫助。 – That1Guy

+0

你能提供你所嘗試過的嗎?可以逐行循環遍歷一個文件,並將該行分配給一個變量是可能的,那麼究竟出了什麼問題?順便說一句,HUGE有多大? – ChrisP

回答

12

只需保存以前,當你遍歷到下一個

prevLine = "" 
for line in file: 
    # do some work here 
    prevLine = line 

這將存儲在prevLine前行,而你是循環

編輯顯然OP需要向後讀取這個文件:

aaand之後像一個小時的研究我多次在內存限制內做到這一點

Here你去林,那傢伙知道自己在做什麼,這裏是他最好的主意:

General approach #2: Read the entire file, store position of lines

With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.

Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.

Pros: Almost as easy to implement as the first approach Cons: can take a while to read large files

+0

非常感謝。但我忘了提及我必須向後讀取文件。 –

+0

@LimH。我添加了代碼以便向後循環:D – Stephan

+0

魔法。我是python的新手,雖然我知道文件是可迭代的,但使用[:: - 1]從來沒有想過。謝謝。 –

2

我會寫一個簡單的發生器任務:

def pairwise(fname): 
    with open(fname) as fin: 
     prev = next(fin) 
     for line in fin: 
      yield prev,line 
      prev = line 

或者,你可以使用pairwise食譜from itertools

def pairwise(iterable): 
    "s -> (s0,s1), (s1,s2), (s2, s3), ..." 
    a, b = itertools.tee(iterable) 
    next(b, None) 
    return itertools.izip(a, b) 
4

@Lim,這裏是我會怎麼寫(回覆評論)

def do_stuff_with_two_lines(previous_line, current_line): 
    print "--------------" 
    print previous_line 
    print current_line 

my_file = open('my_file.txt', 'r') 

if my_file: 
    current_line = my_file.readline() 

for line in my_file: 

    previous_line = current_line 
    current_line = line 

    do_stuff_with_two_lines(previous_line, current_line) 
+0

謝謝你。我非常抱歉,但我忘了提及我必須向後讀取文件。 –