2017-03-01 115 views
0

有沒有一種快速的方法,使用Python中的lxml的xpath將下面的xml轉換爲字典?或者其他有效的方式?xpath到dic python,lxml和xml

<rec item="1"> 
    <tag name="atr1">random text</tag> 
    <tag name="atr2">random text</tag> 
    ..................................   
</rec> 
<rec item="2"> 
    <tag name="atr1">random text2</tag> 
    <tag name="atr2">random text2</tag> 
    ..................................   
</rec> 
<rec item="3"> 
    <tag name="atr1">random text3</tag> 
    <tag name="atr2">random text3</tag> 
    ..................................   
</rec> 

需要字典這樣的,或其他呈三角:

dic = [ 
    {  
     'attr1':'random text', 
     'attr2':'random text' 
    }, 
    {  
     'attr1':'random text2', 
     'attr2':'random text2' 
    }, 
    {  
     'attr1':'random text3', 
     'attr2':'random text3' 
    } 
] 

回答

1

您可以使用列表與詞典一起理解Ÿ理解:

[{ tag.xpath('string(@name)') : tag.xpath('string()') for tag in record.xpath('tag')} for record in records.xpath('//rec')] 

下面是一個完整的例子:

from lxml import etree as ET 
xml = '''<records> 
<rec item="1"> 
    <tag name="atr1">random text</tag> 
    <tag name="atr2">random text</tag> 
    ..................................   
</rec> 
<rec item="2"> 
    <tag name="atr1">random text2</tag> 
    <tag name="atr2">random text2</tag> 
    ..................................   
</rec> 
<rec item="3"> 
    <tag name="atr1">random text3</tag> 
    <tag name="atr2">random text3</tag> 
    ..................................   
</rec> 
</records>''' 
records = ET.fromstring(xml) 
rec_list = [{ tag.xpath('string(@name)') : tag.xpath('string()') for tag in rec.xpath('tag') } for rec in records.xpath('rec')] 
print(rec_list) 

輸出

[{'atr1': 'random text', 'atr2': 'random text'}, {'atr1': 'random text2', 'atr2': 'random text2'}, {'atr1': 'random text3', 'atr2': 'random text3'}] 
+0

它的工作原理!現在我正在研究如何改進輸出。正如我事先知道的屬性名稱(name =「attr1)可能更有效的方法將是具有以下結構: – bogumbiker

+0

attribute_name = {'atr1','atr2'} attribute_values = [{'random text','隨機文本'},{'隨機文本2','隨機文本2'},{'隨機文本3','隨機文本3'}] 但不確定它會帶來什麼價值? – bogumbiker

0

你可以試試下面的代碼:

source = lxml.etree.fromstring('xml_source_is_here') 
[{attr:text} for attr,text in zip(source.xpath('//tag/@name'), source.xpath('//tag/text()'))] 

輸出:

[{'atr1': 'random text'}, {'atr2': 'random text'}, 
{'atr1': 'random text2'}, {'atr2': 'random text2'}, 
{'atr1': 'random text3'}, {'atr2': 'random text3'}]