我想從中國的網絡爬蟲一個列表,我的計劃是:網址與非英文字符
import pandas as pd
states = pd.read_html('http://baike.baidu.com/item/天津/132308',encoding='utf-8')
print(states[0])
,因爲有非英文單詞「天津」,存在一些誤區:
Traceback (most recent call last):
File "/Users/biyuntian/Documents/nihao.py", line 2, in <module>
fiddy_states = pd.read_html('http://baike.baidu.com/item/天津/132308',encoding='utf-8')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 743, in _parse
raise_with_traceback(retained)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/compat/__init__.py", line 344, in raise_with_traceback
raise exc.with_traceback(traceback)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)
如何解決這個問題? 順便說一下,我使用python 3在MacBook Air上
您是否嘗試過除ASCII之外的其他編碼? –
雖然語言可能是英文,但字符是拉丁文。 –