我有一個相當複雜的模板腳本,BeautifulSoup4由於某種原因不理解。正如你在下面看到的,BS4在放棄之前只是部分地解析樹。這是爲什麼,有沒有辦法解決它?BeautifulSoup沒有正確解析腳本文本/模板
>>> from bs4 import BeautifulSoup
>>> html = """<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</h1></header><table><tr><th>Title</th><td class="class"></td><th>Title</th><td class="class"></td></tr><tr><th>Title</th><td class="class"></td><th>Another row</th><td class="checksum"></td></tr></table></section></script> Other stuff I want to stay"""
>>> soup = BeautifulSoup(html)
>>> soup.findAll('script')
[<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</script>]
編輯:在進一步的測試,由於某種原因,似乎BS3能夠正確地解析此:
>>> from BeautifulSoup import BeautifulSoup as bs3
>>> soup = bs3(html)
>>> soup.script
<script id="scriptname" type="text/template"><section class="sectionname"><header><h1>Test</h1></header><table><tr><th>Title</th><td class="class"></td><th>Title</th><td class="class"></td></tr><tr><th>Title</th><td class="class"></td><th>Another row</th><td class="checksum"></td></tr></table></section></script>
哪個版本? – bnjmn
我正在使用版本4.3.2 – jumbopap
find_all和findAll都是一樣的。無論是http://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names – jumbopap