如何使用python解析xml提要？

我想解析這個XML（http://www.reddit.com/r/videos/top/.rss），並有麻煩這樣做。我試圖保存每個項目中的YouTube鏈接，但由於「頻道」子節點而遇到麻煩。我如何達到這個水平，然後我可以遍歷這些項目？如何使用python解析xml提要？

#reddit parse 
reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') 
#convert to string: 
reddit_data = reddit_file.read() 
#close file because we dont need it anymore: 
reddit_file.close() 

#entire feed 
reddit_root = etree.fromstring(reddit_data) 
channel = reddit_root.findall('{http://purl.org/dc/elements/1.1/}channel') 
print channel 

reddit_feed=[] 
for entry in channel: 
    #get description, url, and thumbnail 
    desc = #not sure how to get this 

    reddit_feed.append([desc])

來源

2012-10-14 sharataka

您可以嘗試findall('channel/item')

import urllib2 
from xml.etree import ElementTree as etree 
#reddit parse 
reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') 
#convert to string: 
reddit_data = reddit_file.read() 
print reddit_data 
#close file because we dont need it anymore: 
reddit_file.close() 

#entire feed 
reddit_root = etree.fromstring(reddit_data) 
item = reddit_root.findall('channel/item') 
print item 

reddit_feed=[] 
for entry in item: 
    #get description, url, and thumbnail 
    desc = entry.findtext('description') 
    reddit_feed.append([desc])

來源

2012-10-14 03:41:57 Himanshu

我寫的你使用Xpath表達式（測試成功）：

from lxml import etree 
import urllib2 

headers = { 'User-Agent' : 'Mozilla/5.0' } 
req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers) 
reddit_file = urllib2.urlopen(req).read() 

reddit = etree.fromstring(reddit_file) 

for item in reddit.xpath('/rss/channel/item'): 
    print "title =", item.xpath("./title/text()")[0] 
    print "description =", item.xpath("./description/text()")[0] 
    print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0] 
    print "link =", item.xpath("./link/text()")[0] 
    print "-" * 100

來源

2012-10-14 03:41:38

如何使用python解析xml提要？

回答

相關問題