2016-03-16 45 views
0

我試圖解析HTML page是我救了我的電腦(Windows 10)BeautifulSoup UnicodeEncodeError

from bs4 import BeautifulSoup 

with open("res/JLPT N5 vocab list.html", "r", encoding="utf8") as f: 
    soup = BeautifulSoup(f, "html.parser") 
tables = soup.find_all("table") 
sectable= tables[1] 
for tr in sectable.contents[1:]: 
    if tr.name == "tr": 
     try: 
      print(tr.td.a.get_text()) 
     except(AttributeError): 
      continue 

應該打印所有的日語單詞的第一列,但在print(tr.td.a.get_text())有人提出錯誤說UnicodeEncodeError: 'charmap" codec can't encode character in position 0-1: character maps to (undefined)那麼,如何我能解決這個錯誤嗎?

回答

0

最後,我解決了這個問題,根據Beautiful Soup Documentatioin's Miscellaneous.

UnicodeEncodeError: 'charmap' codec can't encode character u'\xfoo' in position bar (or just about any other UnicodeEncodeError) - This is not a problem with Beautiful Soup. This problem shows up in two main situations. First, when you try to print a Unicode character that your console doesn’t know how to display. (See this page on the Python wiki for help.) Second, when you’re writing to a file and you pass in a Unicode character that’s not supported by your default encoding. In this case, the simplest solution is to explicitly encode the Unicode string into UTF-8 with u.encode("utf8").

在我的情況,是因爲我想打印一個Unicode字符,我的控制檯不知道如何來顯示它。
所以,我enabled TrueType font for console,改變系統區域設置爲日語(使控制檯編碼被改變,可以選擇支持日本的控制檯字體),然後改變控制檯字體到MSコシック(這種字體出現後,我改變了系統區域設置)。
如果我想將其寫入文件,我剛打開的文件,並指定編碼成UTF-8。