跳過列表comprehnsion 2行

我試圖利用列表理解從一個非常大的文件進行排序數據。文件結構如下：跳過列表comprehnsion 2行

THING 
info1 
info2 
info3 
THING 
info1 
info2 
info3

...等等。

基本上試圖將所有info1收集到列表中，並將所有info2收集到另一個列表中。我有一個前面的腳本來做這件事，但速度很慢。我也試圖使它面向對象，所以我可以更有效地使用數據。

舊腳本：

info1_data = [] 
info2_data = [] 
with open(myfile) as f: 
    for line in f: 
     if re.search('THING',line): 
      line=next(f) 
      info1_data.append(line) 
      line=next(f) 
      info2_data.append(line)

新的腳本：

def __init__(self, file): 
    self.file = file 

def sort_info1(self): 
    with self.file as f: 
     info1_data = [next(f) for line in f if re.search('THING',line)] 
    return info1_data 

def sort_info2(self): 
    with self.file as f: 
     info2_data = [next(f).next(f) for line in f if re.search('THING',line)] 
    return info2_data

新的腳本適用於越來越info1_data爲列表。但是，要獲得info2_data，我找不到任何用這種方法跳過2行的東西。我猜對了next(f).next(f)。它運行但不產生任何東西。

這可能嗎？

非常感謝。

從摩西的幫助我有這個解決方案。 islice雖然很令人困惑，但我並沒有完全理解它，即使在閱讀python.docs之後。 iterable是否獲取數據（即info1或info2）或者執行start，stop和step來指定提取哪些數據？

islice（迭代器，啓動，停止[，步]）

from itertools import islice 
import re 

class SomeClass(object): 
    def __init__(self, file): 
     self.file = file 

    def search(self, word, i): 
     self.file.seek(0) # seek to start of file 
     for line in self.file: 
      if re.search(word, line) and i == 0: 
       line = next(self.file) 
       yield line 
      elif re.search(word, line) and i == 1: 
       line = next(self.file) 
       line = next(self.file) 
       yield line 

    def sort_info1(self): 
     return list(islice(self.search('THING',0), 0, None, 2)) 

    def sort_info2(self): 
     return list(islice(self.search('THING',1), 2, None, 2)) 


info1 = SomeClass(open("test.dat")).sort_info1() 
info2 = SomeClass(open("test.dat")).sort_info2()

來源

2017-08-07 matman9

寫給你自己的'next'函數，該函數將跳過的行數作爲第二個參數，缺省值爲1. –

你應該seek文件回到起點，以重複從文件的開始搜索。此外，您可以使用生成器函數將搜索操作與數據生成分離。然後使用itertools.islice邁過線：

from itertools import islice 

class SomeClass(object): 
    def __init__(self, file): 
     self.file = file 

    def search(self, word): 
     self.file.seek(0) # seek to start of file 
     for line in self.file: 
      if re.search(word, line): 
       # yield next two lines 
       yield next(self.file) 
       yield next(self.file) 

    def sort_info1(self): 
     return list(islice(self.search('THING'), 0, None, 2)) 

    def sort_info2(self): 
     return list(islice(self.search('THING'), 1, None, 2))

但是代替傳遞文件的，我會建議你通過文件路徑，而不是這樣的文件可能是每次使用後關閉，避免舉起當他們不是（或尚未）需要時的資源。

來源

2017-08-07 12:20:25

謝謝！我是新來的islice ...我已經將它加入到我的腳本中，但它只是返回搜索詞'THING'作爲列表而不是info1或info2作爲列表。我已經瀏覽了python文檔，但仍然不太遵循它。 – matman9

@ matman9你是否從發電機功能中產生正確的項目？ –

在生成器中應該'....，0，None，2'返回info1？謝謝 – matman9

你可以這樣做：

def sort_info2(self): 
    with self.file as f: 
     info2_data = [(next(f),next(f))[1] for line in f if re.search('THING',line)] 
    return info2_data

但它看起來有點怪異的方式！

來源

2017-08-07 12:43:57 akhilsp

跳過列表comprehnsion 2行

回答

相關問題