2012-05-05 37 views
1

我試圖用BeautifulSoup解析html並得到了奇怪的錯誤。 這裏是重現問題的最小代碼。 (Windows 7 32位,ActivePython 2.7)。<script>標記和HTMLParseError

from bs4 import BeautifulSoup 
s = """ 
<html> 
<script> 
var pstr = "<li><font color='blue'>1</font></li>"; 
for(var lc=0;lc<o.length;lc++){} 
</script> 
</html> 
""" 
p = BeautifulSoup(s) 

回溯:

Traceback (most recent call last): 
    File "<pyshell#69>", line 1, in <module> 
    p = BeautifulSoup(s) 
    File "C:\Python27\lib\site-packages\bs4\__init__.py", line 168, in __init__ 
    self._feed() 
    File "C:\Python27\lib\site-packages\bs4\__init__.py", line 181, in _feed 
    self.builder.feed(self.markup) 
    File "C:\Python27\lib\site-packages\bs4\builder\_htmlparser.py", line 56, in feed 
    super(HTMLParserTreeBuilder, self).feed(markup) 
    File "C:\Python27\lib\HTMLParser.py", line 108, in feed 
    self.goahead(0) 
    File "C:\Python27\lib\HTMLParser.py", line 148, in goahead 
    k = self.parse_starttag(i) 
    File "C:\Python27\lib\HTMLParser.py", line 229, in parse_starttag 
    endpos = self.check_for_whole_start_tag(i) 
    File "C:\Python27\lib\HTMLParser.py", line 304, in check_for_whole_start_tag 
    self.error("malformed start tag") 
    File "C:\Python27\lib\HTMLParser.py", line 115, in error 
    raise HTMLParseError(message, self.getpos()) 
HTMLParseError: malformed start tag, at line 5, column 25 

如果您刪除以 '無功PSTR = ...' 行,解析會很好地工作。有沒有辦法得到這樣的HTML代碼的正確解析?

+1

你的代碼工作我使用Python 2.7.3和BeautifulSoup 3.2.0。 – garnertb

回答

1

您可以嘗試舊版本的BS或安裝不同的解析器。請參閱BeautifulSoup網站上有關「you need a parser」和「installing a parser」的文檔。

您當前的代碼工作在Python 2.7版和BS3:

from BeautifulSoup import BeautifulSoup 
s = """ 
<html> 
<script> 
var pstr = "<li><font color='blue'>1</font></li>"; 
for(var lc=0;lc<o.length;lc++){} 
</script> 
</html> 
""" 
p = BeautifulSoup(s) 

print p.find('script').text 

,併產生這樣的輸出:

var pstr = "<li><font color='blue'>1</font></li>"; 
for(var lc=0;lc<o.length>