我正在研究一個大的fasta文件,我想根據基因ID拆分爲多個文件。我試圖從biopython教程使用上面的腳本:將大型fasta文件拆分爲多個文件的biopython腳本
def batch_iterator(iterator, batch_size):
"""Returns lists of length batch_size.
This can be used on any iterator, for example to batch up
SeqRecord objects from Bio.SeqIO.parse(...), or to batch
Alignment objects from Bio.AlignIO.parse(...), or simply
lines from a file handle.
This is a generator function, and it returns lists of the
entries from the supplied iterator. Each list will have
batch_size entries, although the final list may be shorter.
"""
entry = True # Make sure we loop once
while entry:
batch = []
while len(batch) < batch_size:
try:
entry = iterator.next()
except StopIteration:
entry = None
if entry is None:
# End of file
break
batch.append(entry)
if batch:
yield batch
record_iter=SeqIO.parse(open('/path/sorted_sequences.fa'), 'fasta')
for i, batch in enumerate (batch_iterator(record_iter, 93)):
filename='gene_%i.fasta' % (i + 1)
with open('/path/files/' + filename, 'w') as ouput_handle:
count=SeqIO.write(batch, ouput_handle, 'fasta')
print ('Wrote %i records to %s' % (count, filename))
它不會對文件93序列中它們分割,但它給每93.我不能看到錯誤2個文件,但我想有一個。 還有另一種方法可以用不同的方式分割大型fasta文件嗎? 感謝
你是什麼意思,它給每個93組2個文件? – rodgdor
該腳本產生重複文件,即2個文件,每個文件包含93個基因,每個文件都帶有gene_1。我知道每個都有93個。所以在生成第一個93序列文件之後,應該移到下一個93,但我不這樣做。 – Ana