使用lxml和elementtree解析XML

我試圖解析XML文檔以返回<input>包含ref屬性的節點。一個玩具示例可以工作，但文檔本身會返回一個空數組，當它顯示匹配時。使用lxml和elementtree解析XML

玩具例如

import elementtree.ElementTree 
from lxml import etree 
tree = etree.XML('<body><input ref="blabla"><label>Cats</label></input><input ref="blabla"><label>Dogs</label></input><input ref="blabla"><label>Birds</label></input></body>') 
# I can return the relevant input nodes with: 
print len(tree.findall(".//input[@ref]")) 
2

但由於某種原因如下（減少）文件的工作失敗：

的example.xml

<?xml version="1.0"?> 
<h:html xmlns="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
    <h:head> 
    <h:title>A title</h:title> 
    </h:head> 
    <h:body> 
    <group ref="blabla"> 
     <label>Group 1</label> 
     <input ref="blabla"> 
     <label>Field 1</label> 
     </input> 
    </group> 
    </h:body> 
</h:html>

腳本

import elementtree.ElementTree 
from lxml import etree 
with open ("example.xml", "r") as myfile: 
    xml = myfile.read() 
tree = etree.XML(xml) 
print len(tree.findall(".//input[@ref]")) 
0

任何想法爲什麼這會失敗，以及如何解決？我認爲這可能與XML標題有關。非常感謝任何幫助。

來源

2015-08-26 geotheory

有什麼錯誤訊息？究竟是什麼失敗？ – refi64

我認爲問題是，你的整個文檔中的元素是特別的命名空間，所以未命名空間.findall(".//input[@ref]"))表達不文檔中的input元素，這實際上是一個命名空間input元素匹配，在http://www.w3.org/2002/xforms命名空間。

所以，也許試試這個：

.findall(".//{http://www.w3.org/2002/xforms}input[@ref]")

我原來的答覆後更新，使用XForms名稱空間代替XHTML命名空間（如已在另一個答案已經注意到）。

來源

2015-08-26 00:40:50 sideshowbarker

嗨sideshowbarker。對不起，我仍然是一個空陣列。 – geotheory

好吧，我沒有真正測試它，但現在就做到這一點，看看我得到了什麼 – sideshowbarker

哈哈！ '.findall（「.//{http://www.w3.org/2002/xforms}input[@ref]」）'是票:) – geotheory

如可以從XML中可以看出，XML命名空間爲不帶前綴的元素就是 - "http://www.w3.org/2002/xforms"，這是因爲在沒有父元素h:html任何前綴定義爲xmlns，只有元素前綴h:有命名空間如。

因此，您還需要在查詢中使用該名稱空間。示例 -

root.findall(".//{http://www.w3.org/2002/xforms}input[@ref]")

示例/演示 -

>>> s = """<?xml version="1.0"?> 
... <h:html xmlns="http://www.w3.org/2002/xforms" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
... <h:head> 
...  <h:title>A title</h:title> 
... </h:head> 
... <h:body> 
...  <group ref="blabla"> 
...  <label>Group 1</label> 
...  <input ref="blabla"> 
...   <label>Field 1</label> 
...  </input> 
...  </group> 
... </h:body> 
... </h:html>""" 
>>> import xml.etree.ElementTree as ET 
>>> root = ET.fromstring(s) 
>>> root.findall(".//{http://www.w3.org/1999/xhtml}input[@ref]") 
>>> root.findall(".//{http://www.w3.org/2002/xforms}input[@ref]") 
[<Element '{http://www.w3.org/2002/xforms}input' at 0x02288EA0>]

來源

2015-08-26 01:44:06

使用lxml和elementtree解析XML

回答

相關問題