-1
正則表達式我有以下文件命名seq.fasta:文本(蟒蛇)的多塊
>AAM15934.1| NtrX [Gluconacetobacter diazotrophicus]| NTRX1 | Response_reg - Sigma54_activat - HTH_8
MGHEILIVDDEPDIRLLVEGILRDEGYETRLAGDSDSAISAFRARRPSLVILDVWLQGSRLDGLGILQAI
QGEEPVVPTIMISGHGTIETAVAALQHGAYDFIEKPFQSDRLLLVVRRALEASRLARENAELRLRAGPEA
MLYGDSPVIAGVRNQIERVAPSGSRVLISGAAGAGKEVAARMIHARSPGPKAFIALNCATLAPGRFEEEL
FGIEGAPDGTGRRTGVLERAHGGTLLLDEVSDMPIETQGKIVRALQDQSFERVGGASRVKVDVRVLAATN
RDLQEAIAAGRFREDLYYRLAVVPLRVPSLRERREDIPGLARLFLRRAAENAGLPLRDLSGDAVAALQSY
DWPGNARELRNLMERLLIMMPGNGSDLIRAEMLPPSVGQGAPALLKFDPAADVMGLPLREARDLFETQYL
QAQLLRFGGNISRTAGFVGMERSALHRKLKQLGVTSEERGAG
>WP_002731145.1| NtrX [Phaeospirillum molischianum]| NTRX1 | Response_reg - Sigma54_activat - HTH_8
MAHDILIVDDEADIRVLIAGILEDEGHSTREAANADEALERIRARRPSLVIQDIWLQGSRLDGLGVLDEI
KREHPDVPVVMISGHGTIETAVQAIKQGAYDFIEKPFKADRLLLVVDRAIESARLKRENQELRVRSGSTG
DLVGISPALVQIRQTIERVAPTNSRVLITGPAGSGKEVAARMIHAHSRRTEGPFVVVNCAAMHPDRMEIE
LFGTEYGADGSTSPRKIGTFEQAHSGTLLLDEVADMPLETQGKIVRVLQDQTFERVGGGKRVEVDVRVIA
TTNRDLQSEMIAGHFREDLFYRLNVVPIRMPALRDGKEDIPLLARQFMQLAAQLAGVPPRPLGEDALAAL
QAYDWPGNVRQLRNAIDWLLIMAPGDWRDPVRADMLPSEIGAITPAVLRWEKSSEIMTLPLREARELFER
EYLLAQVNRFAGNISRTAAFVGMERSALHRKLKLLGINTDEKVR
>WP_002967695.1| NtrX [Brucella abortus]| NTRX1 | Response_reg - Sigma54_activat - HTH_8
MAADILVVDDEVDIRDLVAGILSDEGHETRTAFDADSALAAINDRAPRLVFLDIWLQGSRLDGLALLDEI
KKQHPELPVVMISGHGNIETAVSAIRRGAYDFIEKPFKADRLILVAERALETSKLKREVSDLRKRTGDQL
ELVGTSLAMNQLRQTIERVAPTNSRIMITGPSGAGKELVARTIHAQSSRANGPFVTVNAATITPERMEIE
LFGTEMDGGERKVGALEEAHGGILYLDEVADMPRETQNKILRVLVDQQFERVGGTKRVKVDVRIISSTAQ
NLEGMIAEGTFREDLFHRLSVVPVQVPALAARREDIPSLVEFFMKQIAEQAGIKPRKIGPDAMAVLQAHS
WPGNLRQLRNNVERLMILTRGDDPDELVTADLLPAEIGDTLPRAPTESDQHIMALPLREARERFEKEYLI
AQINRFGGNISRTAEFVGMERSALHRKLKSLGV
我想提出的每個字母塊在列表中。 例子:
列出內容:
List[0] = MGHEILIVDDEPDIRLLVEGILRDEGYETRLAGDSDSAISAFRARRPSLVILDVWLQGSRLDGLGILQAI
QGEEPVVPTIMISGHGTIETAVAALQHGAYDFIEKPFQSDRLLLVVRRALEASRLARENAELRLRAGPEA
MLYGDSPVIAGVRNQIERVAPSGSRVLISGAAGAGKEVAARMIHARSPGPKAFIALNCATLAPGRFEEEL
FGIEGAPDGTGRRTGVLERAHGGTLLLDEVSDMPIETQGKIVRALQDQSFERVGGASRVKVDVRVLAATN
RDLQEAIAAGRFREDLYYRLAVVPLRVPSLRERREDIPGLARLFLRRAAENAGLPLRDLSGDAVAALQSY
DWPGNARELRNLMERLLIMMPGNGSDLIRAEMLPPSVGQGAPALLKFDPAADVMGLPLREARDLFETQYL
QAQLLRFGGNISRTAGFVGMERSALHRKLKQLGVTSEERGAG
List[1] = MAHDILIVDDEADIRVLIAGILEDEGHSTREAANADEALERIRARRPSLVIQDIWLQGSRLDGLGVLDEI
KREHPDVPVVMISGHGTIETAVQAIKQGAYDFIEKPFKADRLLLVVDRAIESARLKRENQELRVRSGSTG
DLVGISPALVQIRQTIERVAPTNSRVLITGPAGSGKEVAARMIHAHSRRTEGPFVVVNCAAMHPDRMEIE
LFGTEYGADGSTSPRKIGTFEQAHSGTLLLDEVADMPLETQGKIVRVLQDQTFERVGGGKRVEVDVRVIA
TTNRDLQSEMIAGHFREDLFYRLNVVPIRMPALRDGKEDIPLLARQFMQLAAQLAGVPPRPLGEDALAAL
QAYDWPGNVRQLRNAIDWLLIMAPGDWRDPVRADMLPSEIGAITPAVLRWEKSSEIMTLPLREARELFER
EYLLAQVNRFAGNISRTAAFVGMERSALHRKLKLLGINTDEKVR
List[2] = MAADILVVDDEVDIRDLVAGILSDEGHETRTAFDADSALAAINDRAPRLVFLDIWLQGSRLDGLALLDEI
KKQHPELPVVMISGHGNIETAVSAIRRGAYDFIEKPFKADRLILVAERALETSKLKREVSDLRKRTGDQL
ELVGTSLAMNQLRQTIERVAPTNSRIMITGPSGAGKELVARTIHAQSSRANGPFVTVNAATITPERMEIE
LFGTEMDGGERKVGALEEAHGGILYLDEVADMPRETQNKILRVLVDQQFERVGGTKRVKVDVRIISSTAQ
NLEGMIAEGTFREDLFHRLSVVPVQVPALAARREDIPSLVEFFMKQIAEQAGIKPRKIGPDAMAVLQAHS
WPGNLRQLRNNVERLMILTRGDDPDELVTADLLPAEIGDTLPRAPTESDQHIMALPLREARERFEKEYLI
AQINRFGGNISRTAEFVGMERSALHRKLKSLGV
但我掙扎分裂,並把它們在列表中,我的代碼是這樣的:
import re
myfile = open('seq.fasta', 'r').read()
regex = re.compile(r'^>([^\n\r]+)[\n\r]([A-Z\n\r]+)', re.MULTILINE)
matches = [m.groups() for m in regex.finditer(myfile)]
for m in matches:
onlySequences = (m[1])
print(onlySequences)
變量onlySequences
返回剛剛過去的一個字母塊,我如何保留所有的人,每個人都在一個列表中?
您每次迭代'matches'時都要重寫'onlySequences'。 –
你不需要regex來做到這一點。 –