用lxml查找元素的屬性

我需要解析一個xml文件來提取一些數據。我只需要具有某些屬性的一些元素，這裏的文檔的示例：用lxml查找元素的屬性

<root> 
    <articles> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
     <article type="info"> 
      <content>some text</content> 
     </article> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
    </articles> 
</root>

在這裏，我想獲得僅與類型「新聞」的文章。什麼是最有效和優雅的方式來做到這一點與lxml？

我試圖與find方法，但它是不是很漂亮：

from lxml import etree 
f = etree.parse("myfile") 
root = f.getroot() 
articles = root.getchildren()[0] 
article_list = articles.findall('article') 
for article in article_list: 
    if "type" in article.keys(): 
     if article.attrib['type'] == 'news': 
      content = article.find('content') 
      content = content.text

來源

2011-02-23 Jérôme Pigeot

您可以使用XPath，例如root.xpath("//article[@type='news']")

此xpath表達式將返回所有<article/>元素的列表，其中值爲「news」的「type」屬性。然後你可以迭代它來做你想做的事情，或者在任何地方傳遞它。

得到公正的文本內容，您可以擴展的XPath像這樣：

root = etree.fromstring(""" 
<root> 
    <articles> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
     <article type="info"> 
      <content>some text</content> 
     </article> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
    </articles> 
</root> 
""") 

print root.xpath("//article[@type='news']/content/text()")

，這將輸出['some text', 'some text']。或者如果你只是想要的內容元素，它將是"//article[@type='news']/content" - 依此類推。

來源

2011-02-23 15:36:09

僅供參考，您可以用findall達到同樣的效果：

root = etree.fromstring(""" 
<root> 
    <articles> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
     <article type="info"> 
      <content>some text</content> 
     </article> 
     <article type="news"> 
      <content>some text</content> 
     </article> 
    </articles> 
</root> 
""") 

articles = root.find("articles") 
article_list = articles.findall("article[@type='news']/content") 
for a in article_list: 
    print a.text

來源

2015-02-02 10:09:55 Kjir

用lxml查找元素的屬性

回答

相關問題