2009-11-05 21 views
3

我有一個XML格式的BLAST輸出文件。它是從每個序列中報告的具有50個命中的22個查詢序列。我想提取所有50x22的點擊。這是我目前擁有的代碼,但它僅從第一個查詢中提取50個匹配。BioPython:從Blast輸出文件中提取序列ID

from Bio.Blast import NCBIXM 
blast_records = NCBIXML.parse(result_handle) 
blast_record = blast_records.next() 

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w') 

for alignment in blast_record.alignments: 
    for hsp in alignment.hsps: 
      save_file.write('>%s\n' % (alignment.title,)) 
save_file.close() 

有人有任何建議,以提取所有的命中?我想我必須使用別的比對齊。 希望這很清楚。謝謝!

Jon

+0

爲什麼雙重帖子? http://stackoverflow.com/questions/1684194/python-saving-output-of-a-for-loop-to-file一個小時前左右? – mjv 2009-11-05 23:57:56

+0

由於這裏的大多數人可能不使用BioPython,如果你提供了一些有用的鏈接,你可能會得到更多的答案 – 2009-11-05 23:59:12

+0

mjv:以前的文章是關於如何保存輸出。這幾乎是相同的代碼,但現在我想改變它。 gnibbler:你是指什麼有用的鏈接?像鏈接幫助解答?我一直在檢查很多鏈接。像biopython文檔一樣,但問題是我很難閱讀這些文檔 – Jon 2009-11-06 00:34:08

回答

3

這應該得到所有記錄。與原來的相比,新穎性是

for blast_record in blast_records 

這是一個python成語通過項目迭代在一個「類似列表的」對象,如blast_records(檢查CBIXML module documentation表明解析()確實返回一個迭代器)

from Bio.Blast import NCBIXM 
blast_records = NCBIXML.parse(result_handle) 

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w') 

for blast_record in blast_records: 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      save_file.write('>%s\n' % (alignment.title,)) 
    #here possibly to output something to file, between each blast_record 
save_file.close() 
2

我使用此代碼爲提取所有結果

from Bio.Blast import NCBIXML 
for record in NCBIXML.parse(open("rpoD.xml")) : 
    print "QUERY: %s" % record.query 
    for align in record.alignments : 
     print " MATCH: %s..." % align.title[:60] 
     for hsp in align.hsps : 
      print " HSP, e=%f, from position %i to %i" \ 
       % (hsp.expect, hsp.query_start, hsp.query_end) 
      if hsp.align_length < 60 : 
       print " Query: %s" % hsp.query 
       print " Match: %s" % hsp.match 
       print " Sbjct: %s" % hsp.sbjct 
      else : 
       print " Query: %s..." % hsp.query[:57] 
       print " Match: %s..." % hsp.match[:57] 
       print " Sbjct: %s..." % hsp.sbjct[:57] 


print "Done" 

或更少的細節

from Bio.Blast import NCBIXML 
for record in NCBIXML.parse(open("NC_003197.xml")) : 
    #We want to ignore any queries with no search results: 
    if record.alignments : 
     print "QUERY: %s..." % record.query[:60] 
     for align in record.alignments : 
      for hsp in align.hsps : 
       print " %s HSP, e=%f, from position %i to %i" \ 
       % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end) 
print "Done" 

我用這個網站

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/