BioPython：從Blast輸出文件中提取序列ID

我有一個XML格式的BLAST輸出文件。它是從每個序列中報告的具有50個命中的22個查詢序列。我想提取所有50x22的點擊。這是我目前擁有的代碼，但它僅從第一個查詢中提取50個匹配。BioPython：從Blast輸出文件中提取序列ID

from Bio.Blast import NCBIXM 
blast_records = NCBIXML.parse(result_handle) 
blast_record = blast_records.next() 

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w') 

for alignment in blast_record.alignments: 
    for hsp in alignment.hsps: 
      save_file.write('>%s\n' % (alignment.title,)) 
save_file.close()

有人有任何建議，以提取所有的命中？我想我必須使用別的比對齊。希望這很清楚。謝謝！

Jon

來源

2009-11-05 Jon

爲什麼雙重帖子？ http://stackoverflow.com/questions/1684194/python-saving-output-of-a-for-loop-to-file一個小時前左右？ – mjv 2009-11-05 23:57:56

由於這裏的大多數人可能不使用BioPython，如果你提供了一些有用的鏈接，你可能會得到更多的答案 – 2009-11-05 23:59:12

mjv：以前的文章是關於如何保存輸出。這幾乎是相同的代碼，但現在我想改變它。 gnibbler：你是指什麼有用的鏈接？像鏈接幫助解答？我一直在檢查很多鏈接。像biopython文檔一樣，但問題是我很難閱讀這些文檔 – Jon 2009-11-06 00:34:08

這應該得到所有記錄。與原來的相比，新穎性是

for blast_record in blast_records

這是一個python成語通過項目迭代在一個「類似列表的」對象，如blast_records（檢查CBIXML module documentation表明解析（）確實返回一個迭代器）

from Bio.Blast import NCBIXM 
blast_records = NCBIXML.parse(result_handle) 

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w') 

for blast_record in blast_records: 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      save_file.write('>%s\n' % (alignment.title,)) 
    #here possibly to output something to file, between each blast_record 
save_file.close()

來源

2009-11-06 01:33:46 mjv

我使用此代碼爲提取所有結果

from Bio.Blast import NCBIXML 
for record in NCBIXML.parse(open("rpoD.xml")) : 
    print "QUERY: %s" % record.query 
    for align in record.alignments : 
     print " MATCH: %s..." % align.title[:60] 
     for hsp in align.hsps : 
      print " HSP, e=%f, from position %i to %i" \ 
       % (hsp.expect, hsp.query_start, hsp.query_end) 
      if hsp.align_length < 60 : 
       print " Query: %s" % hsp.query 
       print " Match: %s" % hsp.match 
       print " Sbjct: %s" % hsp.sbjct 
      else : 
       print " Query: %s..." % hsp.query[:57] 
       print " Match: %s..." % hsp.match[:57] 
       print " Sbjct: %s..." % hsp.sbjct[:57] 


print "Done"

或更少的細節

from Bio.Blast import NCBIXML 
for record in NCBIXML.parse(open("NC_003197.xml")) : 
    #We want to ignore any queries with no search results: 
    if record.alignments : 
     print "QUERY: %s..." % record.query[:60] 
     for align in record.alignments : 
      for hsp in align.hsps : 
       print " %s HSP, e=%f, from position %i to %i" \ 
       % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end) 
print "Done"

我用這個網站

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/

來源

2010-03-24 06:04:16

BioPython：從Blast輸出文件中提取序列ID

回答

相關問題