如何在Python中跳過匹配模式？

我有以下行test.fa：如何在Python中跳過匹配模式？

#test.fa 
>1 
AGAGGGAGCTG 
CCTCAGGGCTG 
CACTCAGGAAA 
TTGGGGCGCTG 
AGCATGGGGGG 
CAGGAGGGGCC

我需要忽略開頭「>」的線，並連接以下行成一個單一的字符串。但是，下面的腳本不僅會跳過帶有「>」的行，而且還會在連接剩餘的行之前跳過下一行。

#!/usr/bin/env python 
import sys 
import re 
string = "" 
with open("test.fa","rt") as f: 
     for line in f: 
      if re.match(">",line): 
       line = f.next() 
      else: 
       line = line.rstrip("\n") 
       string = string + line 
print (string)

任何人都可以幫助修復腳本，或建議更好的方法來做到這一點嗎？謝謝！！

來源

2015-10-05 harsh

我會看看，感謝何塞 – harsh

但是，我建議使用Biopython [biopython解析FASTA（http://biopython.org/wiki/SeqIO#Sequence_Input） –

無論如何，行計數器已經增加了每個循環，因此您實際上不需要在if塊中執行任何操作。

for line in f: 
     if re.match(">",line): 
      pass 
     else: 
      line = line.rstrip("\n") 
      string = string + line

或者

for line in f: 
     if not re.match(">",line): 
      line = line.rstrip("\n") 
      string = string + line

其他增強功能：你不需要正則表達式來確定字符串開頭什麼字符，並以列表累積線，一般建議在連接字符串。

lines = [] 
for line in f: 
    if not line.startswith(">"): 
     lines.append(line.rstrip("\n")) 
string = "".join(lines)

，或作爲一個班輪：

string = "".join(line.rstrip("\n") for line in f if not line.startswith(">"))

來源

2015-10-05 16:28:40 Kevin

謝謝大家！所有的解決方案都是類似的，謝謝凱文的額外增強。對任何初學者都有幫助。 – harsh

你基本上是調用line.next()兩次，每次的時間，因爲你循環，它越來越下一行。我建議你用這個

去

#!/usr/bin/env python 
import sys 
import re 
string = "" 
with open("test.fa","rt") as f: 
     for line in f: 
      if not re.match(">",line):: 
       line = line.rstrip("\n") 
       string = string + line 
print (string)

來源

2015-10-05 16:29:28 CollinD

你不需要

line = f.next()

在迭代器中自動發生。只是這樣做：

#!/usr/bin/env python 
import sys 
import re 

string = "" 
with open("test.fa","rt") as f: 
    for line in f: 
     if not re.match(">",line): 
      line = line.rstrip("\n") 
      string = string + line 
print (string)

來源

2015-10-05 16:30:28 HumanCatfood

如何在Python中跳過匹配模式？

回答

相關問題