0
我使用OSX 10.6和python 2.7.1與BeautifulSoup 3.0和feedparser 5.01。 我正試圖解析紐約時報的RSS源,驗證,以及哪個美麗的湯本身將會愉快地解析。Python通用feedparser在unicode錯誤崩潰
最小碼產生的錯誤是:
import feedparser
from BeautifulSoup import BeautifulSoup
feed = feedparser.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml")
- 它,如果我使用URL或 如果我使用urllib2.urlopen得到 內容失敗。
- 我也嘗試過字符集檢測器。
錯誤塊:
/Users/user/Source/python/feed/BeautifulSoup.py:1553: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
elif data[:3] == '\xef\xbb\xbf':
/Users/user/Source/python/feed/BeautifulSoup.py:1556: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
elif data[:4] == '\x00\x00\xfe\xff':
/Users/user/Source/python/feed/BeautifulSoup.py:1559: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
elif data[:4] == '\xff\xfe\x00\x00':
Traceback (most recent call last):
File "parse.py", line 5, in <module>
feed = feedparser.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml")
File "/Users/user/Source/python/feed/feedparser.py", line 3822, in parse
feedparser.feed(data.decode('utf-8', 'replace'))
File "/Users/user/Source/python/feed/feedparser.py", line 1851, in feed
sgmllib.SGMLParser.feed(self, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
self.goahead(0)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 143, in goahead
k = self.parse_endtag(i)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 320, in parse_endtag
self.finish_endtag(tag)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 360, in finish_endtag
self.unknown_endtag(tag)
File "/Users/user/Source/python/feed/feedparser.py", line 657, in unknown_endtag
method()
File "/Users/user/Source/python/feed/feedparser.py", line 1545, in _end_description
value = self.popContent('description')
File "/Users/user/Source/python/feed/feedparser.py", line 961, in popContent
value = self.pop(tag)
File "/Users/user/Source/python/feed/feedparser.py", line 868, in pop
mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
File "/Users/user/Source/python/feed/feedparser.py", line 2420, in _parseMicroformats
p = _MicroformatsParser(htmlSource, baseURI, encoding)
File "/Users/user/Source/python/feed/feedparser.py", line 2024, in __init__
self.document = BeautifulSoup.BeautifulSoup(data)
File "/Users/user/Source/python/feed/BeautifulSoup.py", line 1228, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/Users/user/Source/python/feed/BeautifulSoup.py", line 892, in __init__
self._feed()
File "/Users/user/Source/python/feed/BeautifulSoup.py", line 917, in _feed
SGMLParser.feed(self, markup)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 103, in feed
self.rawdata = self.rawdata + data
TypeError: cannot concatenate 'str' and 'NoneType' objects
我將不勝感激任何意見非常多。
我在Python2.7和feedparser 5.0.1上看到沒有錯誤(沒有單獨的BeautifulSoap安裝)。 – jfs 2011-03-28 19:28:50