解析XML RSS feed字節流爲<item>標記

我試圖解析一個元素「」的第一個實例的RSS源。解析XML RSS feed字節流爲<item>標記

def pageReader(url): 
try: 
    readPage = urllib2.urlopen(url) 
except urllib2.URLError, e: 
# print 'We failed to reach a server.' 
# print 'Reason: ', e.reason 
    return 404 
except urllib2.HTTPError, e: 
# print('The server couldn\'t fulfill the request.') 
# print('Error code: ', e.code) 
    return 404 
else: 
    outputPage = readPage.read()   
return outputPage

假定傳遞的參數是正確的。該函數返回一個STR對象，其價值僅僅是一個完整的RSS飼料 - 我已經證實了該類型：

a = isinstance(value, str) 
if not a: 
    return -1

所以，一個完整的RSS提要已經從函數調用返回，這是這一點我打了磚牆 - 我試着用BeautifulSoup，lxml和其他各種libs解析feed，但沒有成功（我有一些成功與BeautifulSoup，但它不能從父母拉某些子元素，例如，。我只是準備訴諸寫我自己的解析器，但我想知道是否有人有任何建議。

要重新創建我的錯誤，只需調用t他上面類似的參數功能：

http://www.cert.org/nav/cert_announcements.rss

你會看到我試圖返回的第一個孩子。

<item> 
<title>New Blog Entry: Common Sense Guide to Mitigating Insider Threats - Best Practice 16 (of 19)</title> 
<link>http://www.cert.org/blogs/insider_threat/2013/02/common_sense_guide_to_mitigating_insider_threats_-_best_practice_16_of_19.html</link> 
<description>This sixteenth of 19 blog posts about the fourth edition of the Common Sense Guide to Mitigating Insider Threats describes Practice 16: Develop a formalized insider threat program.</description> 
<pubDate>Wed, 06 Feb 2013 06:38:07 -0500</pubDate> 
</item>

正如我所說，BeautifulSoup無法找到pubDate和鏈接，這對我的應用程序至關重要。

任何意見將不勝感激。

來源

2013-02-07 0xd3f4ce

我使用BeautifulStoneSoup並通過小寫標籤，像這樣取得了一些成績：

from BeautifulSoup import BeautifulStoneSoup 
xml = '<item><title>New Blog Entry: Common Sense Guide to Mitigating Insider Threats - Best Practice 16 (of 19)</title><link>http://www.cert.org/blogs/insider_threat/2013/02/common_sense_guide_to_mitigating_insider_threats_-_best_practice_16_of_19.html</link><description>This sixteenth of 19 blog posts about the fourth edition of the Common Sense Guide to Mitigating Insider Threats describes Practice 16: Develop a formalized insider threat program.</description><pubDate>Wed, 06 Feb 2013 06:38:07 -0500</pubDate></item>' 


soup = BeautifulStoneSoup(xml) 
item = soup('item')[0] 
print item('pubdate'), item('link')

來源

2013-02-07 20:45:29 That1Guy

是，'BeautifulSoup'較低的情況下，所有的標籤？ – isedev

解析XML RSS feed字節流爲<item>標記

回答

相關問題