2012-10-12 115 views
0

從網址解析下面的xml期間,我有一個問題。在我的URL路徑使用minidom從URL解析XML與Python

示例XML:

<?xml version="1.0" encoding="utf-8"?> 
<Documents> 
    <class> 
     <mid name="yyyyyyyyyyyyy"></mid> 
     <person name="yyyyyyyyyy"></person> 
     <url name="yyyyyyyyy"></url> 
    </class> 
    <class> 
     <mid name="xxxxx"></mid> 
     <person name="xxxxxxxxxx"></person> 
     <url name="xxxxxxxxxxx"></url> 
    </class> 
</Documents> 

下面是我的Python代碼;

def staff_list(request): 

    url = http://path.to.url/ 
    dom = minidom.parse(urlopen(url)) 
    person = dom.getElementsByTagName('person') 
    for i in person: 
     print i.attributes['name'].value 

in forloop我想在xml中打印屬於同一父類的person和url標記值。

我試過以下法迭代,但得到的「值過多解壓」 ERROR

def staff_list(request): 

    url = http://path.to.url/ 
    dom = minidom.parse(urlopen(url)) 
    person = dom.getElementsByTagName('person') 
    mid = dom.getElementsByTagName('mid') 
    url = dom.getElementsByTagName('url') 
    for i,j,k in person,mid,url: 
     print i.attributes['name'].value,j.attributes['name'].value,k.attributes['name'].value 

有什麼建議?

回答

2

你想用zip()的元素結合起來,我認爲:

for i,j,k in zip(person, mid, url): 

雖然幫自己一個大忙,使用ElementTree API代替;該API遠比Python DOM API更加複雜且更易於使用。

+0

Thanks.Works般的魅力 – tunaktunak

1

如果你想與minidom堅持您可以將循環更改爲:

for cls in dom.getElementsByTagName('class'): 
    person = cls.getElementsByTagName('person')[0] 
    mid = cls.getElementsByTagName('mid')[0] 
    url = cls.getElementsByTagName('url')[0] 

    print person.attributes['name'].value 
    print mid.attributes['name'].value 
    print url.attributes['name'].value 

正如@Martijn皮特斯說,看看ElementTree的作爲替代API。例如:

import xml.etree.ElementTree as ET 
documents = ET.fromstring(xmlstr) 
for cls in documents.iter('class'): 
    person = cls.find('person') 
    mid = cls.find('mid') 
    url = cls.find('url') 

    print person.get('name'), mid.get('name'), url.get('name') 
0

我會用XPath和lxml.html: 簡約的方法:

import lxml.html as lh 
doc=lh.parse(test.xml) 

In [70]: persons = doc.xpath('.//person/@name') 

In [71]: urls=doc.xpath('.//person[@name]/following-sibling::url/@name') 

In [72]: mids=doc.xpath('.//person[@name]/preceding-sibling::mid/@name') 

In [73]: [[p,m,u]for p,m,u in zip(persons, mids, urls)] 
Out[73]: 
[['yyyyyyyyyy', 'yyyyyyyyyyyyy', 'yyyyyyyyy'], 
['xxxxxxxxxx', 'xxxxx', 'xxxxxxxxxxx']]