2011-03-08 48 views
0

我使用OSX 10.6和python 2.7.1與BeautifulSoup 3.0和feedparser 5.01。 我正試圖解析紐約時報的RSS源,驗證,以及哪個美麗的湯本身將會愉快地解析。Python通用feedparser在unicode錯誤崩潰

最小碼產生的錯誤是:

import feedparser 
from BeautifulSoup import BeautifulSoup 


feed = feedparser.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml") 
  • 它,如果我使用URL或 如果我使用urllib2.urlopen得到 內容失敗。
  • 我也嘗試過字符集檢測器。

錯誤塊:

/Users/user/Source/python/feed/BeautifulSoup.py:1553: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal 
    elif data[:3] == '\xef\xbb\xbf': 
/Users/user/Source/python/feed/BeautifulSoup.py:1556: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal 
    elif data[:4] == '\x00\x00\xfe\xff': 
/Users/user/Source/python/feed/BeautifulSoup.py:1559: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal 
    elif data[:4] == '\xff\xfe\x00\x00': 
Traceback (most recent call last): 
    File "parse.py", line 5, in <module> 
    feed = feedparser.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml") 
    File "/Users/user/Source/python/feed/feedparser.py", line 3822, in parse 
    feedparser.feed(data.decode('utf-8', 'replace')) 
    File "/Users/user/Source/python/feed/feedparser.py", line 1851, in feed 
    sgmllib.SGMLParser.feed(self, data) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed 
    self.goahead(0) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 143, in goahead 
    k = self.parse_endtag(i) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 320, in parse_endtag 
    self.finish_endtag(tag) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 360, in finish_endtag 
    self.unknown_endtag(tag) 
    File "/Users/user/Source/python/feed/feedparser.py", line 657, in unknown_endtag 
    method() 
    File "/Users/user/Source/python/feed/feedparser.py", line 1545, in _end_description 
    value = self.popContent('description') 
    File "/Users/user/Source/python/feed/feedparser.py", line 961, in popContent 
    value = self.pop(tag) 
    File "/Users/user/Source/python/feed/feedparser.py", line 868, in pop 
    mfresults = _parseMicroformats(output, self.baseuri, self.encoding) 
    File "/Users/user/Source/python/feed/feedparser.py", line 2420, in _parseMicroformats 
    p = _MicroformatsParser(htmlSource, baseURI, encoding) 
    File "/Users/user/Source/python/feed/feedparser.py", line 2024, in __init__ 
    self.document = BeautifulSoup.BeautifulSoup(data) 
    File "/Users/user/Source/python/feed/BeautifulSoup.py", line 1228, in __init__ 
    BeautifulStoneSoup.__init__(self, *args, **kwargs) 
    File "/Users/user/Source/python/feed/BeautifulSoup.py", line 892, in __init__ 
    self._feed() 
    File "/Users/user/Source/python/feed/BeautifulSoup.py", line 917, in _feed 
    SGMLParser.feed(self, markup) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 103, in feed 
    self.rawdata = self.rawdata + data 
TypeError: cannot concatenate 'str' and 'NoneType' objects 

我將不勝感激任何意見非常多。

+1

我在Python2.7和feedparser 5.0.1上看到沒有錯誤(沒有單獨的BeautifulSoap安裝)。 – jfs 2011-03-28 19:28:50

回答

1

我使用Python 2.7.1,feedparser 5.0.1和BeautifulSoup 3.2.0進行了測試,但是Feed沒有引起追溯。嘗試升級到BeautifulSoup 3.2.0。

+0

感謝您的幫助。 – UberAlex 2011-04-01 13:15:14