我試圖讀取通過多個序列提交到NCBI爆炸網站生成的XML文件的列表。從每個文件中,我想打印某些信息。 我想要讀取的文件全部給出後綴"_recombination.xml"
。閱讀多個爆炸文件(biopython)
for file in glob.glob("*_recombination.xml"):
result_handle= open(file)
blast_record=NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
腳本首先找到所有與"_recombination.xml"
後綴,然後,我希望它讀取每個文件和打印某些行的文件(這是幾乎從BioPython直副本烹飪書),這似乎去做。但我得到以下錯誤:
Traceback (most recent call last):
File "Scripts/blast_test.py", line 202, in <module>
blast_record=NCBIXML.read(result_handle)
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read
first = iterator.next()
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse
expat_parser.Parse("", True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594
我不確定問題出在哪裏。我不知道這是否是回過它已經閱讀 - 例如文件試圖循環,關閉文件似乎幫助:
for file in glob.glob("*_recombination.xml"):
result_handle= open(file)
blast_record=NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
result_handle.close()
blast_record.close()
但是,這也給了我另一個錯誤:
Traceback (most recent call last):
File "Scripts/blast_test.py", line 213, in <module> blast_record.close()
AttributeError: 'Blast' object has no attribute 'close'
刪除行blast_record.close(),解析的對象沒有關閉的方法(這是AttributeError試圖告訴你)。 – peterjc 2013-03-14 11:28:47
ExpatError可能是由於破損的XML文件造成的,例如截斷的輸出。你有沒有檢查它的眼睛抱怨的具體文件? – peterjc 2013-03-14 11:29:57