BeautifulSoup沒有正確解析腳本文本/模板

我有一個相當複雜的模板腳本，BeautifulSoup4由於某種原因不理解。正如你在下面看到的，BS4在放棄之前只是部分地解析樹。這是爲什麼，有沒有辦法解決它？BeautifulSoup沒有正確解析腳本文本/模板

>>> from bs4 import BeautifulSoup 
>>> html = """<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</h1></header><table><tr><th>Title</th><td class="class"></td><th>Title</th><td class="class"></td></tr><tr><th>Title</th><td class="class"></td><th>Another row</th><td class="checksum"></td></tr></table></section></script> Other stuff I want to stay""" 
>>> soup = BeautifulSoup(html) 
>>> soup.findAll('script') 
[<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</script>]

編輯：在進一步的測試，由於某種原因，似乎BS3能夠正確地解析此：

>>> from BeautifulSoup import BeautifulSoup as bs3 
>>> soup = bs3(html) 
>>> soup.script 
<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</h1></header><table><tr><th>Title</th><td class="class"></td><th>Title</th><td class="class"></td></tr><tr><th>Title</th><td class="class"></td><th>Another row</th><td class="checksum"></td></tr></table></section></script>

來源

2013-11-20 jumbopap

哪個版本？ – bnjmn

我正在使用版本4.3.2 – jumbopap

find_all和findAll都是一樣的。無論是http://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names – jumbopap

美麗的湯，有時會失敗，它的默認解析器。 Beautiful Soup支持Python標準庫中包含的HTML解析器，但它也支持許多第三方Python解析器。

在某些情況下，我必須將解析器更改爲其他類似：lxml，html5lib或任何其他解析器。

這是上述解釋的例子：

from bs4 import BeautifulSoup  

soup = BeautifulSoup(markup, "lxml")

我建議您閱讀您使用的BS這個http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

來源

2013-11-20 17:48:58

我會研究它;我必須安裝依賴項才能工作。 – jumbopap

是的，你可以直接使用你的命令行中的pip來安裝lxml，或者從 –

下載軟件包。由於某些原因，我無法獲得lxml的工作，但是我得到了這個使用html5lib解析器。爲您提供正確的答案:)謝謝。 – jumbopap

BeautifulSoup沒有正確解析腳本文本/模板

回答

相關問題