我應該首先說出我對Python和Biopython都是新的。我試圖將一個大的.fasta文件(包含多個條目)分成單個文件,每個文件都有一個條目。我在Biopython wiki/Cookbook網站上發現了大部分以下代碼,並對其進行了一些修改。我的問題是,這個生成器將它們命名爲「1.fasta」,「2.fasta」等,我需要它們以一些標識符(例如GI號)命名。將大型fasta拆分爲多個文件,無法用GI編號命名它們
def batch_iterator(iterator, batch_size) :
"""Returns lists of length batch_size.
This can be used on any iterator, for example to batch up
SeqRecord objects from Bio.SeqIO.parse(...), or to batch
Alignment objects from Bio.AlignIO.parse(...), or simply
lines from a file handle.
This is a generator function, and it returns lists of the
entries from the supplied iterator. Each list will have
batch_size entries, although the final list may be shorter.
"""
entry = True #Make sure we loop once
while entry :
batch = []
while len(batch) < batch_size :
try :
entry = next(iterator)
except StopIteration :
entry = None
if entry is None :
#End of file
break
batch.append(entry)
if batch :
yield batch
from Bio import SeqIO
infile = input('Which .fasta file would you like to open? ')
record_iter = SeqIO.parse(open(infile), "fasta")
for i, batch in enumerate(batch_iterator(record_iter, 1)) :
outfile = "c:\python32\myfiles\%i.fasta" % (i+1)
handle = open(outfile, "w")
count = SeqIO.write(batch, handle, "fasta")
handle.close()
如果我試圖取代:
outfile = "c:\python32\myfiles\%i.fasta" % (i+1)
有:
outfile = "c:\python32\myfiles\%s.fasta" % (record_iter.id)
,使其將其命名爲類似的用法類似於SeqIO到seq_record.id的東西,它提供了以下錯誤:
Traceback (most recent call last):
File "C:\Python32\myscripts\generator.py", line 33, in [HTML]
outfile = "c:\python32\myfiles\%s.fasta" % (record_iter.id)
AttributeError: 'generator' object has no attribute 'id'
儘管基因rator函數沒有屬性「id」,我能以某種方式解決這個問題嗎?這個劇本對於我想要做的事情來說太複雜了嗎?!?謝謝,查爾斯
似乎最好和最簡單的方法。打開輸出文件將更清潔與'打開(outfile,「w」)作爲句柄:' – weronika
感謝所有人的幫助! – user1426421
或者,而不是在您的代碼中打開,請使用Biopython來執行此操作: count = SeqIO.write(seq_record,outfile,「fasta」) – peterjc