Python和ElementTree的：迴歸「內部XML」排除和父元素

在Python 2.6使用ElementTree的，什麼是獲取特定元素中的XML（作爲字符串）的好方法，比如你可以在HTML做什麼JavaScript和innerHTML ？Python和ElementTree的：迴歸「內部XML」排除和父元素

這裏是我開始與XML節點的簡化示例：

<label attr="foo" attr2="bar">This is some text <a href="foo.htm">and a link</a> in embedded HTML</label>

我想這個字符串結束：

This is some text <a href="foo.htm">and a link</a> in embedded HTML

我已經試過遍歷父節點並連接子節點的tostring()，但是這隻給出了我的子節點：

# returns only subnodes (e.g. <a href="foo.htm">and a link</a>) 
''.join([et.tostring(sub, encoding="utf-8") for sub in node])

我可以使用正則表達式砍了一個解決方案，但希望能有會比這個東西少哈克：

re.sub("</\w+?>\s*?$", "", re.sub("^\s*?<\w*?>", "", et.tostring(node, encoding="utf-8")))

來源

2010-08-09 Justin Grant

如何：

from xml.etree import ElementTree as ET 

xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>' 
root = ET.fromstring(xml) 

def content(tag): 
    return tag.text + ''.join(ET.tostring(e) for e in tag) 

print content(root) 
print content(root.find('child2'))

結果造成：

start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here 
here as well<sub2 /><sub3 />

來源

2010-08-10 04:34:30

以下爲我工作：

from xml.etree import ElementTree as etree 
xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>' 
dom = etree.XML(xml) 

(dom.text or '') + ''.join(map(etree.tostring, dom)) + (dom.tail or '') 
# 'start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here'

dom.text or ''被用來獲得文本在root元素的開頭。如果沒有文字dom.text是None。

注意，結果不是一個有效的XML - 一個有效的XML應該只有一個根元素。

看一看的ElementTree docs about mixed content。

使用Python 2.6.5，Ubuntu的10.04

來源

2010-08-09 20:27:04

喜埃米爾 - 您的解決方案工程確定是否所有的文字裏面的子元素，但在我的情況下，文本直接在父元素內部中斷。關於混合內容的說明顯然適用於此，儘管我還不確定如何將頭部，尾部和子元素組合在一起以發出連貫的字符串。 – 2010-08-09 20:50:09

接近......但etree.tostring（）不包括各個子元件的尾部。而且我認爲最後的dom.tail是不需要的，因爲那是一個元素之後的字符串，而不是它。 – 2010-08-09 20:55:58

我似乎不明白你賈斯汀 - 'here'開始，'和'和'結束here'是正確的根元素內的文本？上述的片段可能需要一些擺弄 - 你可以創建一些測試用例和改進 - 見鏈接，如何處理混合內容的文檔。 – 2010-08-10 08:44:14

Python和ElementTree的：迴歸「內部XML」排除和父元素

回答

相關問題