2013-03-14 50 views
1

我試圖讀取通過多個序列提交到NCBI爆炸網站生成的XML文件的列表。從每個文件中,我想打印某些信息。 我想要讀取的文件全部給出後綴"_recombination.xml"閱讀多個爆炸文件(biopython)

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 

腳本首先找到所有與"_recombination.xml"後綴,然後,我希望它讀取每個文件和打印某些行的文件(這是幾乎從BioPython直副本烹飪書),這似乎去做。但我得到以下錯誤:

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 202, in <module> 
blast_record=NCBIXML.read(result_handle) 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read 
first = iterator.next() 
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse 
expat_parser.Parse("", True) # End of XML record 
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594 

我不確定問題出在哪裏。我不知道這是否是回過它已經閱讀 - 例如文件試圖循環,關閉文件似乎幫助:

for file in glob.glob("*_recombination.xml"): 
    result_handle= open(file) 
    blast_record=NCBIXML.read(result_handle) 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 
    result_handle.close() 
    blast_record.close() 

但是,這也給了我另一個錯誤:

Traceback (most recent call last): 
File "Scripts/blast_test.py", line 213, in <module> blast_record.close() 
AttributeError: 'Blast' object has no attribute 'close' 
+0

刪除行blast_record.close(),解析的對象沒有關閉的方法(這是AttributeError試圖告訴你)。 – peterjc 2013-03-14 11:28:47

+0

ExpatError可能是由於破損的XML文件造成的,例如截斷的輸出。你有沒有檢查它的眼睛抱怨的具體文件? – peterjc 2013-03-14 11:29:57

回答

2

我通常使用的解析方法,而不是閱讀,也許它可以幫助你:

for blast_record in NCBIXML.parse(open(input_xml)): 
    for alignment in blast_record.alignments: 
     for hsp in alignment.hsps: 
      print "*****Alignment****" 
      print "sequence:", alignment.title 
      print "length:", alignment.length 
      print "e-value:", hsp.expect 
      print hsp.query 
      print hsp.match 
      print hsp.sbjct 

,並確保您的xml是在您的查詢中生成的-outfmt 5

0

我會爲Biogeek的回答添加評論,但我不能(尚未有足夠的聲望)。在契稅他是對的,你應該使用

NCBIXML.parse(open(input_xml)) 

代替NCBIXML.read(開放(input_xml)),因爲你是「想讀的XML文件的列表」你需要和XML文件列表解析而不是閱讀。它解決了你的問題嗎?