小問題解析XHTML,真的在這裏卡住了,我不明白髮生了什麼,我只是想解析從網絡,沒有什麼特別的一個正常的XHTML ...與LXML蟒蛇
這裏的錯誤:
File "class/page.py", line 85, in xslParse
doc = lxml.etree.fromstring(self.content)
File "lxml.etree.pyx", line 2753, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54647)
File "parser.pxi", line 1578, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82764)
File "parser.pxi", line 1457, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:81562)
File "parser.pxi", line 965, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:78232)
File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74488)
File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75379)
File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74712)
XMLSyntaxError: StartTag: invalid element name, line 1, column 2
self.content是一個普通的字符串,由http響應給出,沒有乾淨,沒有替換,沒有,只是服務器的響應,所以什麼是fu ..?
的HTML的開始是:
<!doctype html>
<!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ -->
<!--[if lt IE 7 ]> <html lang="fr" class="no-js ie6" itemscope itemtype="http://schema.org/Product"> <![endif]-->
<!--[if IE 7 ]> <html lang="fr" class="no-js ie7" itemscope itemtype="http://schema.org/Product"> <![endif]-->
<!--[if IE 8 ]> <html lang="fr" class="no-js ie8" itemscope itemtype="http://schema.org/Product"> <![endif]-->
<!--[if IE 9 ]> <html lang="fr" class="no-js ie9" itemscope itemtype="http://schema.org/Product"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html lang="en" class="no-js" itemscope itemtype="http://schema.org/Product"> <!--<![endif]-->
<head>......
一個正常的網頁,爲什麼LXML無法解析正常以及格式化文檔?
您是否嘗試過使用'lxml.html.fromstring'而不是'lxml.etree.fromstring'? – unutbu 2012-08-11 20:19:59
現在就去查看一下! thx兄 – hgates 2012-08-11 20:36:07
它工作更好thx真的:) – hgates 2012-08-11 20:45:47