如何選擇性讀取xml文件

如何有選擇地讀取下面的xml，以便在遇到具有相似名稱的標籤時，xml讀取器會跳過第一個標籤。如何選擇性讀取xml文件

<paraphrase_candidates source_description="id:9249">  
    <annotation author="87" is_paraphrase="true" source_description="id:18689" > 
     <phenomenon type="lex_same_polarity" projection="local"          source_description="id:5528"> 
      <snippet id="16488" > 
       <scope offset="125" length="4"/> 
      </snippet> 
      <snippet id="16489" > 
       <scope offset="71" length="11"/> 
      </snippet> 
     </phenomenon> 
     <phenomenon type="syn_diathesis" source_description="id:5536"> 
      <snippet id="16488" > 
       <scope offset="32" length="92"/> 
      </snippet> 
       <scope offset="0" length="70"/> 
     </phenomenon> 
    </annotation> 
</paraphrase_candidates>

具體而言，我想跳過第一現象標籤和檢索第二現象標籤的範圍屬性。

My attempts 
for x in root.findall('scope'): 
    print x.attrib[0]

輸出：空

預期輸出：{offset="32" length="92"} and {offset="0" length="70}

來源

2017-04-03 Boby

root.findall('scope')因爲scope不是你的XML的root直接子返回空。改爲使用.//scope（請參閱the docs）將使您獲得XML中的所有元素。

從第二phenomenon只得到scope元素，你可以使用位置索引謂詞（請注意，XPath位置索引從1開始，而不是0）：

root.findall('.//phenomenon[2]//scope')

測試代碼：

>>> raw = '''<paraphrase_candidates source_description="id:9249">        
...  <annotation author="87" is_paraphrase="true" source_description="id:18689" >   
...   <phenomenon type="lex_same_polarity" projection="local"       
       source_description="id:5528">             
...    <snippet id="16488" >               
...     <scope offset="125" length="4"/>           
...    </snippet>                 
...    <snippet id="16489" >               
...     <scope offset="71" length="11"/>           
...    </snippet>                 
...   </phenomenon>                  
...   <phenomenon type="syn_diathesis" source_description="id:5536">     
...    <snippet id="16488" >               
...     <scope offset="32" length="92"/>           
...    </snippet>                 
...     <scope offset="0" length="70"/>           
...   </phenomenon>                  
...  </annotation>                   
... </paraphrase_candidates>'''                
>>> from xml.etree import ElementTree as et             
>>> root = et.fromstring(raw)                 
>>> for x in root.findall('.//phenomenon[2]//scope'): 
...  print x.attrib 
... 
{'length': '92', 'offset': '32'} 
{'length': '70', 'offset': '0'}

來源

2017-04-03 10:46:09 har07

感謝您的解決方案，它運行良好。 – Boby

如果xml文件的前面有一個http，那麼它如何工作？ '' – Boby

@Boby，搜索」xpath默認命名空間「。它不稱爲「一個http」，它被稱爲命名空間。請不要用評論來提問後續問題：提出一個新問題。哦，如果答案有效，你應該接受它（點擊勾號）。 –

如何選擇性讀取xml文件

回答

相關問題