2013-04-23 25 views
0

我有以下字符串:字符串轉換成嵌套的XML數據

"Sweden, Västmanland, Västerås" 
"Sweden, Dalarna, Leksand" 
"Ireland, Cork, Cobh" 
"Ireland, Clare, Boston" 
"Ireland, Cork, Baltimore" 
"Sweden, Dalarna, Mora" 

我希望轉換成XML如下:

<?xml version="1.0" ?> 
<data> 
<country name = "Ireland"> 
    <region name = "Clare"> 
     <settlement name = "Boston"/> 
    </region> 
    <region name = "Cork"> 
     <settlement name = "Baltimore"/> 
     <settlement name = "Cobh"/> 
    </region> 
</country> 

<country name = "Sweden"> 
    <region name = "Dalarna"> 
     <settlement name = "Leksand"/> 
     <settlement name = "Mora"/> 
    </region> 
    <region name = "Västmanland"> 
     <settlement name = "Västerås"/> 
    </region> 
</country> 
</data> 

什麼建在python3庫可能幫我做這種轉換,以便我不必要地重新發明輪子?

+2

這不是有效的XML標記。您需要使用屬性或XML文本,沒有屬性名稱的'''是無效的。 – 2013-04-23 09:07:06

+3

這是另一個有點非建設性的問題(你基本上是在尋求對圖書館和方法的建議)。你有沒有嘗試過自己呢?你遇到什麼問題? – 2013-04-23 09:09:44

+0

「排序」不適用於XML。除非你的意思是分組。 – 2013-04-23 09:18:12

回答

2
import xml.etree.ElementTree as ET 
from collections import defaultdict 

strings = ["Sweden, Västmanland, Västerås", 
"Sweden, Dalarna, Leksand", 
"Ireland, Cork, Cobh", 
"Ireland, Clare, Boston", 
"Ireland, Cork, Baltimore", 
"Sweden, Dalarna, Mora"] 

dd = defaultdict(lambda: defaultdict(list)) 

for s in strings: 
    a, b, c = s.split(', ') 
    dd[a][b].append(c) 

root = ET.Element('data') 

for c, regions in dd.items(): 
    country = ET.SubElement(root, 'country', {'name': c}) 
    for r, settlements in regions.items(): 
     region = ET.SubElement(country, 'region', {'name': r}) 
     for s in settlements: 
      settlement = ET.SubElement(region, 'settlement', {'name': s}) 


import xml.dom.minidom # just to pretty print for this example 
print(xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()) 

<?xml version="1.0" ?> 
<data> 
    <country name="Ireland"> 
     <region name="Cork"> 
      <settlement name="Cobh"/> 
      <settlement name="Baltimore"/> 
     </region> 
     <region name="Clare"> 
      <settlement name="Boston"/> 
     </region> 
    </country> 
    <country name="Sweden"> 
     <region name="Dalarna"> 
      <settlement name="Leksand"/> 
      <settlement name="Mora"/> 
     </region> 
     <region name="Västmanland"> 
      <settlement name="Västerås"/> 
     </region> 
    </country> 
</data> 
0

您可以分析您的輸入字典如下:

strings = ["Sweden, Vastmanland, Vasteras", 
"Sweden, Dalarna, Leksand", 
"Ireland, Cork, Cobh", 
"Ireland, Clare, Boston", 
"Ireland, Cork, Baltimore", 
"Sweden, Dalarna, Mora" ] 

d = {} 
for s in strings: 
    tmp = s.split(", ") 
    country = tmp[0].strip() 
    region = tmp[1].strip() 
    settlement = tmp[2].strip() 

    if d.get(country): 
     if d[country].get(region): 
      d[country][region].append(settlement) 
     else: 
      d[country][region] = [settlement] 
    else: 
     d[country] = {region: [settlement]} 

for k, v in d.items(): 
    print k,v 

這給出了以下的輸出:

Sweden {'Vastmanland': ['Vasteras'], 'Dalarna': ['Leksand', 'Mora']} 
Ireland {'Clare': ['Boston'], 'Cork': ['Cobh', 'Baltimore']} 

現在你可以很容易地轉換這個字典到XML字符串。

雖然,jamylak的答案更好。