使用ElementTree解析XML

我正在嘗試使用ElementTree在XML字符串中搜索標籤和屬性。這裏是字符串：使用ElementTree解析XML

'<?xml version="1.0" encoding="UTF-8" ?>\n<uclassify xmlns="http://api.uclassify.com/1/ResponseSchema" version="1.01">\n\t<status success="true" statusCode="2000"/>\n\t<readCalls>\n\t<classify id="thing">\n\t\t<classification textCoverage="0">\n\t\t\t<class className="Astronomy" p="0.333333"/>\n\t\t\t<class className="Biology" p="0.333333"/>\n\t\t\t<class className="Mathematics" p="0.333333"/>\n\t\t</classification>\n\t</classify>\n\t</readCalls>\n</uclassify>'

美化：

<?xml version="1.0" encoding="UTF-8" ?> 
<uclassify xmlns="http://api.uclassify.com/1/ResponseSchema" version="1.01"> 
    <status success="true" statusCode="2000"/> 
    <readCalls> 
     <classify id="thing"> 
     <classification textCoverage="0"> 
      <class className="Astronomy" p="0.333333"/> 
      <class className="Biology" p="0.333333"/> 
      <class className="Mathematics" p="0.333333"/> 
     </classification> 
     </classify> 
    </readCalls> 
</uclassify>

我用這個小碼打開字符串轉換成可搜索的XML樹：

>>> from xml.etree.ElementTree import fromstring, ElementTree 
>>> tree = ElementTree(fromstring(a))

我想到用tree.find('uclassify')將返回該元素/標籤，但它似乎什麼都不返回。我也試過：

for i in tree.iter(): 
    print i

它打印的東西，但不是我想要的：

<Element '{http://api.uclassify.com/1/ResponseSchema}uclassify' at 0x1011ec410> 
<Element '{http://api.uclassify.com/1/ResponseSchema}status' at 0x1011ec390> 
<Element '{http://api.uclassify.com/1/ResponseSchema}readCalls' at 0x1011ec450> 
<Element '{http://api.uclassify.com/1/ResponseSchema}classify' at 0x1011ec490> 
<Element '{http://api.uclassify.com/1/ResponseSchema}classification' at 0x1011ec4d0> 
<Element '{http://api.uclassify.com/1/ResponseSchema}class' at 0x1011ec510> 
<Element '{http://api.uclassify.com/1/ResponseSchema}class' at 0x1011ec550> 
<Element '{http://api.uclassify.com/1/ResponseSchema}class' at 0x1011ec590>

什麼是搜索標籤和屬性，如BeautifulSoup模塊中的最簡單的方法？例如，我怎樣才能輕鬆地檢索類元素的className和p屬性？我一直在閱讀關於lxml，xml.dom.minidom和ElementTree的不同的東西，但是我一定錯過了一些東西，因爲我似乎無法得到我想要的東西。

來源

2012-08-09 bmay2

所有uclassify首先是根節點，所以如果你只是打印tree上方你會看到：

>>> tree 
<Element '{http://api.uclassify.com/1/ResponseSchema}uclassify' at 0x101f56410>

查找僅着眼於當前節點的孩子，所以tree.find只能找到status和readCalls標籤。

最後，XML命名空間調整一切的名字，所以你需要抓住的xmlns並用它來建立自己的標籤名稱：

xmlns = tree.tag.split("}")[0][1:] 
readCalls = tree.find('{%s}readCalls' % (xmlns,))

例如獲得3個class標籤，你會需要：

classify = readCalls.find('{%s}classify' % (xmlns,)) 
classification = classify.find('{%s}classification' %(xmlns,)) 
classes = classification.findall('{%s}classes'%(xmlns,))

來源

2012-08-09 03:53:04 stderr

使用ElementTree解析XML

回答

相關問題