2013-08-04 101 views
0

我有一個巨大的數據文件(〜2 G),需要分割成奇數行和偶數行,分別處理並寫入兩個文件,我不想讀取整個文件放入RAM中,所以我認爲一個發生器應該是一個合適的選擇。總之我想做這樣的事情:使用Python分隔奇數行和偶數行

lines = (l.strip() for l in open(inputfn)) 
oddlines = somefunction(getodds(lines)) 
evenlines = somefunction(getevens(lines)) 
outodds.write(oddlines) 
outevens.write(evenlines) 

這可能嗎?顯然,索引將無法正常工作:

In [75]: lines[::2] 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
/home/kaiyin/Phased/build37/chr22/segments/segment_1/<ipython-input-75-97be680d00e3> in <module>() 
----> 1 lines[::2] 

TypeError: 'generator' object is not subscriptable 

回答

2
def oddlines(fileobj): 
    return (line for index,line in enumerate(fileobj) if index % 2) 

def evenlines(fileobj): 
    return (line for index,line in enumerate(fileobj) if not index % 2) 

注意,這將需要掃描文件的兩倍,因爲這些都不是設計並行運行。但是,它的確會導致代碼少得多。 (另請注意,這裏的「奇數」行是索引爲1,3,5的行 - 這意味着由於零索引,第一行是「偶數」行)。

正如Ashwini所說,你也可以使用itertools.islice來做到這一點。

+0

這是很好,很容易理解。只有函數名稱應該切換,因爲python索引從0開始計數。:D,謝謝! – qed

1

使用itertools.islice切片迭代器:如果要讀取的文件只有一次,寫一個發電機是一個包裝了file,並返回一個標誌,表示該線路是奇數還是偶數與沿

from itertools import islice 
with open('filename') as f1, open('evens.txt', 'w') as f2: 
    for line in islice(f1, 0, None, 2): 
     f2.write(line) 

with open('filename') as f1, open('odds.txt', 'w') as f2: 
    for line in islice(f1, 1, None, 2): 
     f2.write(line) 
0

從文件中讀取實際行。

def oddeven(f, even=True): 
    for line in f: 
     yield even, line 
     even = not even 

用法:

with open("infile.txt") as infile, \ 
    open("odd.txt", "w") as oddfile, \ 
    open ("even.txt", "w") as evenfile: 
     for even, line in oddeven(infile): 
      if even: 
       evenfile.write(line) 
      else: 
       oddfile.write(line) 

這可以通過存儲在可轉位容器中的輸出文件中的對象被進一步簡化:

with open("infile.txt") as infile, \ 
    open("odd.txt", "w") as oddfile, \ 
    open ("even.txt", "w") as evenfile: 
     outfiles = (oddfile, evenfile) 
     for even, line in oddeven(infile): 
      outfiles[even].write(line) 
+0

我沒有看到使用'enumerate()'內置的直線,例如'for i,line in enumerate(infile)沒有任何真正的好處:if i%2 == 0:...' –

+0

是的,你猜對了。 – kindall