BeautifulSoup無法用`html5lib`解析html

BeautifulSoup無法解析帶有選項html5lib的html頁面，但可以正常使用html.parser選項。根據docs，html5lib應該比html.parser更寬鬆，那爲什麼我在使用它解析html頁面時遇到了亂碼？BeautifulSoup無法用`html5lib`解析html

下面是一個小的可執行例子。（改html5lib與html.parser後，中國輸出是否正常。）

#_*_coding:utf-8_*_ 
import requests 
from bs4 import BeautifulSoup 

ss = requests.Session() 
res = ss.get("http://tech.qq.com/a/20151225/050487.htm") 
html = res.content.decode("GBK").encode("utf-8") 
soup = BeautifulSoup(html, 'html5lib') 
print str(soup)[0:800] # where you can see if the html is parsed normally or not

來源

2015-12-25 foool

不要重新編碼您的內容。離開處理解碼Beautifulsoup：

soup = BeautifulSoup(res.content, 'html5lib')

如果你要重新編碼，您需要更換meta頭這是存在於源：

<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

或手動解碼並傳遞統一：

soup = BeautifulSoup(res.content.decode('gbk'), 'html5lib')

來源

2015-12-25 14:34:45

BeautifulSoup無法用`html5lib`解析html

回答

相關問題