我試圖拉一些數據從一個流行的基於瀏覽器的遊戲,Python的解碼錯誤,但我有一些解碼錯誤麻煩:與BeautifulSoup,請求和LXML
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.neopets.com/")
p = BeautifulSoup(r.text)
這將產生以下堆棧跟蹤:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/bs4/__init__.py", line 172, in __init__
File "build/bdist.linux-x86_64/egg/bs4/__init__.py", line 185, in _feed
File "build/bdist.linux-x86_64/egg/bs4/builder/_lxml.py", line 195, in feed
File "parser.pxi", line 1187, in lxml.etree._FeedParser.close (src/lxml/lxml.etree.c:87912)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:97055)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:8862)
File "saxparser.pxi", line 274, in lxml.etree._handleSaxCData (src/lxml/lxml.etree.c:93385)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 476: invalid start byte
執行以下操作:
print repr(r.text[476 - 10: 476 + 10])
產地:
u'ttp-equiv="X-UA-Comp'
我真的不知道這裏的問題是什麼。任何幫助是極大的讚賞。謝謝。
您是否嘗試過使用'r.content'? BeautifulSoup爲你解碼,但'r.text'返回Unicode。 –
請參閱下面的評論。這似乎也失敗了。 –