2013-06-18 50 views
0

我需要爲以下術語解析FASTA頭文件:葉,芽,莖和嫩芽,如果序列包含任何一個術語,則打開一個文件並將其放在那裏使用Biopython。從FASTA頭文件中提取條款

所以我讓他們轉換爲使用SeqIO.to_dict字典:

from Bio import SeqIO 
records_dict = SeqIO.to_dict(SeqIO.parse("my_example.fasta","fasta")) 

但現在我不知道如何從標題中的條款。序列是這樣的:

>gi|393741877|gb|FS945568.1|FS945568 FS945568 tea plant lateral roots cDNA library Camellia sinensis cDNA clone LR29G09, mRNA sequence 
CCGGGGATCCATTCCAAAATTCATCATAAACCTCTCAATATTGTTCACTTGAAAAAAGATGA... 

>gi|393741878|gb|FS945569.1|FS945569 FS945569 tea plant lateral roots cDNA library Camellia sinensis cDNA clone LR29G11, mRNA sequence 
CCGGGGGCTATCGAGCACTCACCGACTCACTCGAGAGCTAATACAGTCCACAGC... 

>gi|393751846|gb|FS959695.1|FS959695 FS959695 tea plant young leaves cDNA library Camellia sinensis cDNA clone YL16A05, mRNA sequence 
CCAACAACTTCTTCCTAACACTACCACCTTCTGTCAACTTACTTCTCCAAAGGCTTCTTTCTTCCACCAT 
GGCTGCTTCTACCATGGCTCTCTCTTCCCCATCTTTCGCCGGAAAGGCGGTGAAACTTGCCCCGGAG... 

>gi|393751847|gb|FS959696.1|FS959696 FS959696 tea plant young leaves cDNA library Camellia sinensis cDNA clone YL16A06, mRNA sequence 
GAAACTGCATATAGAAAATCTCACTACCACTCTCTTCCTCTTCCTCTCTATCTTTCCTACCAAAGAAAG... 

>gi|393750830|gb|FS956287.1|FS956287 FS956287 tea plant terminal buds cDNA library Camellia sinensis cDNA clone TB26G04, mRNA sequence 
AGGATCGCACGGCCTTTGTGCCGGCGACGCATCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGA 
TAGT... 

>gi|393750831|gb|FS956288.1|FS956288 FS956288 tea plant terminal buds cDNA library Camellia sinensis cDNA clone TB26G05, mRNA sequence 
TCCCACAAACATGTTGCTCTCATCTTTCCAGTAAAAGATAGAGAGAGAGAGAGAGAGAACAAAGCAG... 

回答

1

不要轉換到詞典 - 你從每個defline需要說明(使用to_dict()僅讓id的關鍵)。

描述只是一個字符串,您可以在其中搜索條件。 按類別分類記錄(可能與每個記錄屬於多個類別),然後用SeqIO.write()保存到文件:

import os 
from Bio import SeqIO 

records = SeqIO.parse("my_example.fasta", "fasta") 

terms = ["leaves", "buds", "stems", "tender shoots"] 
categorized_records = {term: [] for term in terms} 

for record in records: 
    for term in terms: 
     if term in record.description: 
      categorized_records[term].append(record) 

for term, records in categorized_records.items(): 
    fasta_out = "%s.fasta" % term 
    SeqIO.write(records, fasta_out, 'fasta') # Will overwrite file 
相關問題