提取值

我知道從XML提取值以下格式：提取值

<note> 
    <col1>Tove</col1> 
    <col2>J</col2> 
    <test2> 
     <a> a </a> 
     <b> b </b> 
     <c> c </c> 
     <d> d </d> 
    </test2> 
    <code 
     a="1" 
     b="2" 
     c="3" 
    /> 
    <heading>Reminder</heading> 
    <body>Don't forget me this weekend!</body> 
</note>

我已經提取的值如下：

for a in xmls.getiterator(): 
    b = a.find("col1") # or col2 
    if b is not None: 
     print b.text #this helps in extracting the value 
     break

我的問題是我需要以提取test2和code節點的值，但使用上述方法，我得到的輸出爲None

預計輸出

理想如下但得到直接的節點值一樣a,b,c,d,1,2,3將是最好的

  <a> a </a> 
      <b> b </b> 
      <c> c </c> 
      <d> d </d> 

      and 

      a="1" 
      b="2" 
      c="3"

是什麼，如果我們有目標節點名，以提取不同類型的XML值的值機方式？

相關：

來源

2015-12-30 NoobEditor

我會用lxml.etree，.xpath()和.attrib得到屬性值：

import lxml.etree as ET 

data = """<note> 
    <col1>Tove</col1> 
    <col2>J</col2> 
    <test2> 
     <a> a </a> 
     <b> b </b> 
     <c> c </c> 
     <d> d </d> 
    </test2> 
    <code 
     a="1" 
     b="2" 
     c="3" 
    /> 
    <heading>Reminder</heading> 
    <body>Don't forget me this weekend!</body> 
</note> 
""" 

tree = ET.fromstring(data) 

for note in tree.xpath("//note"): 
    test2_values = [value.strip() for value in note.xpath(".//test2/*/text()")] 
    code_attrs = note.find("code").attrib 

    print(test2_values) 
    print(code_attrs)

在這裏，我們基本上遍歷所有note節點（假設有多個節點），獲取內部節點test2下的所有節點的文本以及節點具有的所有屬性。

打印：

['a', 'b', 'c', 'd'] 
{'b': '2', 'c': '3', 'a': '1'}

來源

2015-12-30 06:11:57 alecxe

很酷...有道理....從迭代時間POV，它是一個沉重的過程假定其解析大個XML？ – NoobEditor

@NoobEditor取決於它有多大的問題（不要過早優化，正如你可能記得的那樣）。另外，如果需要，您可以迭代地解析XML，請參閱：http://stackoverflow.com/questions/9856163/using-lxml-and-iterparse-to-parse-a-big-1gb-xml-file和http： //stackoverflow.com/questions/324214/what-is-the-fastest-way-to-parse-large-xml-docs-in-python。 – alecxe

謝謝船長....一遍又一遍！ :) – NoobEditor

回答

相關問題