2016-08-05 95 views
0

我一直在Python文檔中尋找從XML文件中獲取標籤名稱的方法,但我一直沒有取得成功。使用下面的XML文件,可以獲取國家名稱標籤及其所有關聯的子標籤。有誰知道這是如何完成的?如何使用python獲取XML中的所有標籤?

<?xml version="1.0"?> 
<data> 
    <country name="Liechtenstein"> 
     <rank>1</rank> 
     <year>2008</year> 
     <gdppc>141100</gdppc> 
     <neighbor name="Austria" direction="E"/> 
     <neighbor name="Switzerland" direction="W"/> 
    </country> 
    <country name="Singapore"> 
     <rank>4</rank> 
     <year>2011</year> 
     <gdppc>59900</gdppc> 
     <neighbor name="Malaysia" direction="N"/> 
    </country> 
    <country name="Panama"> 
     <rank>68</rank> 
     <year>2011</year> 
     <gdppc>13600</gdppc> 
     <neighbor name="Costa Rica" direction="W"/> 
     <neighbor name="Colombia" direction="E"/> 
    </country> 
</data> 
+0

查找到BeautifulSoup4庫。 – Keozon

回答

1

考慮使用元素樹的iterparse()並構建標籤和文本對的嵌套列表。有條件if邏輯用於組國家項目一起離開了元素沒有文本,然後replace()用來清理出換行和多白色空間,iterparse()涵蓋:

import xml.etree.ElementTree as et 

data = [] 
for (ev, el) in et.iterparse(path): 
    inner = [] 

    if el.tag == 'country':   
     for name, value in el.items(): 
      inner.append([el.tag+'-'+name, str(value).replace('\n','').replace(' ','')]) 
     for i in el: 
      if str(i.text) != 'None': 
       inner.append([i.tag, str(i.text).replace('\n','').replace(' ','')]) 

      for name, value in i.items(): 
       inner.append([i.tag+'-'+name, str(value).replace('\n','').replace(' ','')]) 
     data.append(inner) 

print(data) 
# [[['country-name', 'Liechtenstein'], ['rank', '1'], ['year', '2008'], ['gdppc', '141100'], 
# ['neighbor-name', 'Austria'], ['neighbor-direction', 'E'], 
# ['neighbor-name', 'Switzerland'], ['neighbor-direction', 'W']] 
# [['country-name', 'Singapore'], ['rank', '4'], ['year', '2011'], ['gdppc', '59900'], 
# ['neighbor-name', 'Malaysia'], ['neighbor-direction', 'N']] 
# [['country-name', 'Panama'], ['rank', '68'], ['year', '2011'], ['gdppc', '13600'], 
# ['neighbor-name', 'CostaRica'], ['neighbor-direction', 'W'], 
# ['neighbor-name', 'Colombia'], ['neighbor-direction', 'E']]] 
-1

查看Python的內置XML功能,遞歸遍歷文檔並收集集合中的所有標記。

相關問題