2016-03-14 53 views
0

我想從xml文件中提取文本內容。 XML文件是這樣的:AttributeError:'NodeList'對象沒有屬性'getElementsByTagName'

<?xml version="1.0" encoding="UTF-8"?> 
<brca:tcga_bcr xsi:schemaLocation="http://tcga.nci/bcr/xml/clinical/brca/2.7 http://tcga-data.nci.nih.gov/docs/xsd/BCR/tcga.nci/bcr/xml/clinical/brca/2.7/TCGA_BCR.BRCA_Clinical.xsd" schemaVersion="2.7" xmlns:brca="http://tcga.nci/bcr/xml/clinical/brca/2.7" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:admin="http://tcga.nci/bcr/xml/administration/2.7" xmlns:clin_shared="http://tcga.nci/bcr/xml/clinical/shared/2.7" xmlns:shared="http://tcga.nci/bcr/xml/shared/2.7" xmlns:brca_shared="http://tcga.nci/bcr/xml/clinical/brca/shared/2.7" xmlns:shared_stage="http://tcga.nci/bcr/xml/clinical/shared/stage/2.7" xmlns:brca_nte="http://tcga.nci/bcr/xml/clinical/brca/shared/new_tumor_event/2.7/1.0" xmlns:nte="http://tcga.nci/bcr/xml/clinical/shared/new_tumor_event/2.7" xmlns:rx="http://tcga.nci/bcr/xml/clinical/pharmaceutical/2.7" xmlns:rad="http://tcga.nci/bcr/xml/clinical/radiation/2.7"> 
    <admin:admin> 
     <admin:bcr xsd_ver="1.17">Nationwide Children's Hospital</admin:bcr> 
     <admin:file_uuid xsd_ver="2.6">6CEF6ECD-264E-4DF6-8419-9E4C564DA7B2</admin:file_uuid> 
     <admin:batch_number xsd_ver="1.17">85.84.0</admin:batch_number> 
     <admin:project_code xsd_ver="">TCGA</admin:project_code> 
     <admin:disease_code xsd_ver="2.6">BRCA</admin:disease_code> 
     <admin:day_of_dcc_upload xsd_ver="1.17">21</admin:day_of_dcc_upload> 
     <admin:month_of_dcc_upload xsd_ver="1.17">1</admin:month_of_dcc_upload> 
     <admin:year_of_dcc_upload xsd_ver="1.17">2016</admin:year_of_dcc_upload> 
     <admin:patient_withdrawal> 
      <admin:withdrawn>false</admin:withdrawn> 
     </admin:patient_withdrawal> 
    </admin:admin> 

我得到了以下錯誤:

AttributeError: 'NodeList' object has no attribute 'getElementsByTagName'

我使用Python 2.7。這是我的代碼的一部分。我無法弄清楚什麼是錯的。有什麼建議麼?

from xml.dom import minidom 
xmldoc = minidom.parse('A0SD.xml') 
bcr = xmldoc.getElementsByTagNameNS('*','tcga_bcr') 
patient_info = bcr.getElementsByTagName('admin') 

回答

0

getElementsByTagNameNS返回所有具有指定標籤的節點的列表。所以bcrNodeList。你不能要求getElemnetsByTagNameNodeList,只有Node。您需要遍歷bcr以獲取每個節點中的admin標籤。或者,如果你期待有整整一個tcga_bcr標籤,那麼你可以採取列表的第一個元素:

bcr = xmldoc.getElementsByTagNameNS('*','tcga_bcr')[0] 
+0

它的工作原理。非常感謝! –