迭代通過XML找到具有特定擴展名的網址與python

我有一個xml文件，我從網址下載。然後我想遍歷xml以找到具有特定文件擴展名的文件的鏈接。迭代通過XML找到具有特定擴展名的網址與python

我的XML看起來是這樣的：

<Foo> 
    <bar> 
     <file url="http://foo.txt"/> 
     <file url="http://bar.doc"/> 
    </bar> 
</Foo>

我寫代碼來獲取XML文件是這樣的：

import urllib2, re 
from xml.dom.minidom import parseString 

file = urllib2.urlopen('http://foobar.xml') 
data = file.read() 
file.close() 
dom = parseString(data) 
xmlTag = dom.getElementsByTagName('file')

然後我「喜歡」來獲得類似的財產以後這個工作：

i=0 
    url = '' 
    while(i < len(xmlTag)): 
     if re.search('*.txt', xmlTag[i].toxml()) is not None: 
       url = xmlTag[i].toxml() 
     i = i + 1; 

** Some code that parses out the url **

但是，這會引發錯誤。任何人都有更好的方法提示？

謝謝！

來源

2012-07-10 ZacAttack

你的最後一段代碼坦白地說是噁心。 dom.getElementsByTagName('file')爲您提供樹中所有<file>元素的列表...只是對它進行迭代。

urls = [] 
for file_node in dom.getElementsByTagName('file'): 
    url = file_node.getAttribute('url') 
    if url.endswith('.txt'): 
     urls.append(url)

順便說一句，您絕不應該用Python手動進行索引編制。即使是在你需要的索引號的罕見情況下，只需要使用枚舉：

mylist = ['a', 'b', 'c'] 
for i, value in enumerate(mylist): 
    print i, value

來源

2012-07-10 22:02:56

是的，這一切都還挺總值今天。我上週剛拿起python。但是，這完美的作品！只需將「url = file_node.getAttribute（'urls'）'這一行更改爲'url = file_node.getAttribute（'url'）'，它就像魅力一樣。謝謝！ – ZacAttack 2012-07-10 22:17:40

@ZacAttack derp，錯誤更正。 – 2012-07-10 22:19:28

使用lxml，urlparse和os.path一個例子：

from lxml import etree 
from urlparse import urlparse 
from os.path import splitext 

data = """ 
<Foo> 
    <bar> 
     <file url="http://foo.txt"/> 
     <file url="http://bar.doc"/> 
    </bar> 
</Foo> 
""" 

tree = etree.fromstring(data).getroottree() 
for url in tree.xpath('//Foo/bar/file/@url'): 
    spliturl = urlparse(url) 
    name, ext = splitext(spliturl.netloc) 
    print url, 'is is a', ext, 'file'

來源

2012-07-10 22:14:31

迭代通過XML找到具有特定擴展名的網址與python

回答

相關問題