閱讀多個爆炸文件（biopython）

我試圖讀取通過多個序列提交到NCBI爆炸網站生成的XML文件的列表。從每個文件中，我想打印某些信息。我想要讀取的文件全部給出後綴"_recombination.xml"。閱讀多個爆炸文件（biopython）

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct

腳本首先找到所有與"_recombination.xml"後綴，然後，我希望它讀取每個文件和打印某些行的文件（這是幾乎從BioPython直副本烹飪書），這似乎去做。但我得到以下錯誤：

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 202, in <module> 
blast_record=NCBIXML.read(result_handle) 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read 
first = iterator.next() 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse 
expat_parser.Parse("", True) # End of XML record 
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594

我不確定問題出在哪裏。我不知道這是否是回過它已經閱讀 - 例如文件試圖循環，關閉文件似乎幫助：

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 
    result_handle.close() 
    blast_record.close()

但是，這也給了我另一個錯誤：

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 213, in <module> blast_record.close() 
AttributeError: 'Blast' object has no attribute 'close'

來源

2013-03-14 user2168818

刪除行blast_record.close（），解析的對象沒有關閉的方法（這是AttributeError試圖告訴你）。 – peterjc 2013-03-14 11:28:47

ExpatError可能是由於破損的XML文件造成的，例如截斷的輸出。你有沒有檢查它的眼睛抱怨的具體文件？ – peterjc 2013-03-14 11:29:57

我通常使用的解析方法，而不是閱讀，也許它可以幫助你：

for blast_record in NCBIXML.parse(open(input_xml)): 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct

，並確保您的xml是在您的查詢中生成的-outfmt 5

來源

2014-08-13 18:16:18 Biosys

我會爲Biogeek的回答添加評論，但我不能（尚未有足夠的聲望）。在契稅他是對的，你應該使用

NCBIXML.parse(open(input_xml))

代替NCBIXML.read（開放（input_xml）），因爲你是「想讀的XML文件的列表」你需要和XML文件列表解析而不是閱讀。它解決了你的問題嗎？

來源

2016-02-02 13:22:30 Cat

閱讀多個爆炸文件（biopython）

回答

相關問題