2011-07-02 44 views
0

檢索來自飼料多個標籤我有下面的XML文檔無法使用feedparser

<?xml version='1.0' encoding='UTF-8'?><entry xmlns='http://www.w3.org/2005/Atom' xmlns:gd='http://schemas.google.com/g/2005' xmlns:issues='http://schemas.google.com/projecthosting/issues/2009' gd:etag='W/"DEAERH47eCl7ImA9WhZTFEQ."'><id>http://code.google.com/feeds/issues/p/chromium/issues/full/921</id><published>2008-09-03T22:51:22.000Z</published><updated>2011-03-19T01:05:05.000Z</updated><title>Incorrect rendering</title><content type='html'>Product Version  : 0.2.149.27 
URLs (if applicable) : http://www.battlefield.ea.com/battlefield/bf/ 
<b>Other browsers tested:</b> 
<b>Add OK or FAIL after other browsers where you have tested this issue:</b> 
    Safari 3: N/A 
    Firefox 3: OK 
     IE 7: OK 
    Opera 9.60: OK 

<b>What steps will reproduce the problem?</b> 
1. Open http://www.battlefield.ea.com/battlefield/bf/ 
2. Look at incorrect render 
</content><link rel='replies' type='application/atom+xml' href='http://code.google.com/feeds/issues/p/chromium/issues/921/comments/full'/><link rel='alternate' type='text/html' href='http://code.google.com/p/chromium/issues/detail?id=921'/><link rel='self' type='application/atom+xml' href='https://code.google.com/feeds/issues/p/chromium/issues/full/921'/><author><name>[email protected]</name><uri>/u/@UBBRQVRZAxFEXgB4GA%3D%3D/</uri></author><issues:closedDate>2009-05-14T20:08:31.000Z</issues:closedDate><issues:id>921</issues:id><issues:label>Type-Bug</issues:label><issues:label>Pri-2</issues:label><issues:label>OS-All</issues:label><issues:label>Area-Compat</issues:label><issues:label>Webkit-specific</issues:label><issues:label>Mstone-2.1</issues:label><issues:label>compat-bug-2.0</issues:label><issues:label>Report-to-webkit</issues:label><issues:label>bulkmove</issues:label><issues:label>Action-ReductionNeeded</issues:label><issues:stars>5</issues:stars><issues:state>closed</issues:state><issues:status>WontFix</issues:status></entry> 

我解析使用feedparser這個文件。我做到以下幾點:

import feedparser 
text = "" #Read from the above document 
d = feedparser.parse(text) 
d.entries[0].issues_label 

我觀察,我得到的只是一個標籤:

d.entries[0].issues_label 
u'Action-ReductionNeeded' 

有多種問題標籤:

<issues:label>Type-Bug</issues:label><issues:label>Pri-2</issues:label><issues:label>OS-All</issues:label><issues:label>Area-Compat</issues:label><issues:label>Webkit-specific</issues:label><issues:label>Mstone-2.1</issues:label><issues:label>compat-bug-2.0</issues:label><issues:label>Report-to-webkit</issues:label><issues:label>bulkmove</issues:label><issues:label>Action-ReductionNeeded</issues:label> 

,但我能夠檢索剛剛過去一。我想檢索所有這些。

回答

1

您可以用lxml而不是解析XML:

>>> import lxml.etree 
>>> doc = lxml.etree.parse(xml) 
>>> ns = {'issues':'http://schemas.google.com/projecthosting/issues/2009'} 
>>> [x.text for x in doc.xpath('//issues:label', namespaces=ns)] 
<<< 
['Type-Bug', 
'Pri-2', 
'OS-All', 
'Area-Compat', 
'Webkit-specific', 
'Mstone-2.1', 
'compat-bug-2.0', 
'Report-to-webkit', 
'bulkmove', 
'Action-ReductionNeeded'] 
+0

謝謝,效果很好。 – Dexter