我有這樣的腳本:Python的BeautifulSoup錯誤
import urllib2
from BeautifulSoup import BeautifulSoup
import html5lib
import lxml
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read())
但是這給了我以下錯誤:
Traceback (most recent call last):
File "akaConnection.py", line 59, in <module>
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read())
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 56, column 872
然後我試圖將此代碼:
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"lxml")
或
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"html5lib")
這給了我這個錯誤:
Traceback (most recent call last):
File "akaConnection.py", line 59, in <module>
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"lxml")
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 156, in goahead
k = self.parse_declaration(i)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1112, in parse_declaration
j = HTMLParser.parse_declaration(self, i)
File "/usr/lib/python2.6/markupbase.py", line 109, in parse_declaration
self.handle_decl(data)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1097, in handle_decl
self._toStringSubclass(data, Declaration)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1030, in _toStringSubclass
self.soup.endData(subclass)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1318, in endData
(not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'
我運行Linux操作系統Ubuntu 10.04,Python的2.6.5,BeautifulSoup的版本是:「3.1.0.1」 如何解決我的代碼,或者是有什麼東西我錯過了什麼?
您的初始腳本似乎對我有用....你有什麼版本的BeautifulSoup?我的3.0.8.1。 – Eli
對於真正破碎的HTML,另一種選擇是先通過Tidy運行它。像http://countergram.com/open-source/pytidylib – Eli
第二個錯誤是複製BeautifulSoup 4的例子,並試圖與BeautifulSoup 3一起使用它。BS3不使用lxml或html5lib。 –