1
我試圖用BeautifulSoup解析html並得到了奇怪的錯誤。 這裏是重現問題的最小代碼。 (Windows 7 32位,ActivePython 2.7)。<script>標記和HTMLParseError
from bs4 import BeautifulSoup
s = """
<html>
<script>
var pstr = "<li><font color='blue'>1</font></li>";
for(var lc=0;lc<o.length;lc++){}
</script>
</html>
"""
p = BeautifulSoup(s)
回溯:
Traceback (most recent call last):
File "<pyshell#69>", line 1, in <module>
p = BeautifulSoup(s)
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 168, in __init__
self._feed()
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 181, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\site-packages\bs4\builder\_htmlparser.py", line 56, in feed
super(HTMLParserTreeBuilder, self).feed(markup)
File "C:\Python27\lib\HTMLParser.py", line 108, in feed
self.goahead(0)
File "C:\Python27\lib\HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "C:\Python27\lib\HTMLParser.py", line 229, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "C:\Python27\lib\HTMLParser.py", line 304, in check_for_whole_start_tag
self.error("malformed start tag")
File "C:\Python27\lib\HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: malformed start tag, at line 5, column 25
如果您刪除以 '無功PSTR = ...' 行,解析會很好地工作。有沒有辦法得到這樣的HTML代碼的正確解析?
你的代碼工作我使用Python 2.7.3和BeautifulSoup 3.2.0。 – garnertb