Python的：把一個文件的特定行到一個列表

我鑽進了以下問題：

鑑於以下結構的文件：

'>some cookies 
chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple 
'>some icecream 
cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee 
'>some other stuff 
letsseewhatfancythings 
wegotinhere

目的：投入

：含有「>」到列表作爲單個字符串

每行代碼後的所有項

所以這個功能經過文件的每一行，如果沒有的「>」它串接所有後續行的發生，並且如果「>」發生除去的「\ n」，，它會自動追加串接的字符串列表和「清除」字符串「序列」的串接下一序列

問題：採取的輸入文件的例子，它只是把東西從「餅乾」和'一些冰淇淋「列入清單 - 但不是來自」一些其他的東西「。所以我們得到的結果如下：

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee] but not 

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee, letsseewhatfancythings 
wegotinhere]

這裏有什麼錯誤的想法？在我可能沒有注意的迭代中存在一些邏輯錯誤，但我不知道在哪裏。

在此先感謝您的任何提示！

來源

2011-04-17 Daniyal

道歉，並感謝Manoj Govindan的編輯！ – Daniyal 2011-04-17 14:39:54

的問題是，你只保存當前部分seq當你打在它'>'一條線。文件結束後，您仍然可以打開該部分，但不保存該部分。

修復程序最簡單的方法是這樣的：

def parseSequenceIntoDictionary(filename): 
    lis=[] 
    seq='' 
    with open(filename, 'r') as fp: 
     for line in fp: 
      if('>' not in line): 
       seq+=line.rstrip() 
      elif('>' in line): 
       lis.append(seq) 
       seq='' 
     # the file ended 
     lis.append(seq) # store the last section 
     lis.remove('') 
     return lis

順便說一句，你應該使用if line.startswith("'>"):以防止可能的錯誤。

來源

2011-04-17 15:28:12

「#store最後一節」是失蹤的想法非常感謝幫助 - 以及使用line.startswith（str）的建議： – Daniyal 2011-04-17 15:42:28

好了，你可以簡單地分爲上'>（如果我得到你正確的）

>>> s=""" 
... '>some cookies 
... chocolatejelly 
... peanutbuttermacadamia 
... doublecoconutapple 
... '>some icecream 
... cherryvanillaamaretto 
... peanuthaselnuttiramisu 
... bananacoffee 
... '>some other stuff 
... letsseewhatfancythings 
... wegotinhere """ 
>>> s.split("'>") 
['\n', 'some cookies \nchocolatejelly \npeanutbuttermacadamia \ndoublecoconutapple \n', 'some icecream \ncherryvanillaamaretto \npeanuthaselnuttiramisu \nbananacoffee \n', 'some other stuff \nletsseewhatfancythings \nwegotinhere '] 
>>>

來源

2011-04-17 14:40:25 kurumi

這個解決方案很吸引人。但是如何在包含'>'的行之後強制分割？ – Daniyal 2011-04-17 15:01:27

如果用一個新行>發現你只追加序列的結果列表。所以最後你有一個填充seq（你缺少的數據），但是你不會把它添加到結果列表中。因此，在你的循環之後，如果有一些數據，就加seq，你應該沒問題。

來源

2011-04-17 14:41:02 Achim

啊，我明白了，但是如果有一些數據存在，我該如何添加seq？ – Daniyal 2011-04-17 15:02:46

my_list = [] 
with open('file_in.txt') as f: 
    for line in f: 
     if line.startswith("'>"): 
      my_list.append(line.strip().split("'>")[1]) 

print my_list #['some cookies', 'some icecream', 'some other stuff']

來源

2011-04-17 15:14:53 snippsat

import re 

def parseSequenceIntoDictionary(filename,regx = re.compile('^.*>.*$',re.M)): 
    with open(filename) as f: 
     for el in regx.split(f.read()): 
      if el: 
       yield el.replace('\n','') 

print list(parseSequenceIntoDictionary('aav.txt'))

來源

2011-04-17 17:22:12 eyquem

Python的：把一個文件的特定行到一個列表

回答

相關問題