Python解碼錯誤=替換

使用Python 2.7，我從網站抓取一些HTML作爲字符串，並立即將其解碼爲unicode。因爲我需要後來知道任何解碼錯誤發生在哪裏，我認爲這將是最好使用錯誤=「替換」，以防止非ASCII字符例外：Python解碼錯誤=替換

linkname = curlinkname.decode("utf-8", errors="replace")

在大多數情況下，這種替換問題字符與佔位符。然而，當我在一個特定的角色運行，我仍然從該行得到一個異常的代碼（U）：

UnicodeEncodeError: 'charmap' codec can't encode character u'\u016b' in position 1: character maps to <undefined>

這是怎麼回事？

2015-07-01 sssnakey

也許編碼不是UTF-8，檢查它首先，你可以使用這個lib中的編碼檢測的https： //github.com/chardet/chardet – efirvida

你可以請分享完整的追溯？ –

你正在閱讀文本文件嗎？ – efirvida

您需要安裝的lib第一

pip install chardet

然後用它

import chardet 
code = chardet.detect(curlinkname) 
linkname = curlinkname.decode(code['encoding'], errors="replace")

2015-07-01 19:16:50 efirvida

回答