帶有Unicode字符的字典鍵提示錯誤

CSV文件具有帶有未識別字符的字符串，JSON文件具有正確字符串的地圖。

FILE.CSV

0,�urawska A. 
1,Polnar J�zef

dict.json

{ 
    "\ufffdurawska A.": "\u017burawska A.", 
    "Polnar J\ufffdzef": "Polnar J\u00f3zef" 
}

parse.py

Traceback (most recent call last): File "parse.py", line 9, in print proper_names[row[1].decode('utf-8')] UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in position 0: ordinal not in range(128)

我如何使用字典與解碼的字符串？

來源

2015-09-24 CodeNinja

對我來說，它看起來像你控制檯無法處理'UTF-8' 。如果您直接嘗試將值打印到控制檯，如'print proper_names.values（）[0]'，您會得到什麼？ –

'UnicodeEncodeError：'ascii'編解碼器無法對位置8中的字符u'\ xf3'進行編碼：序號不在範圍內（128）' – CodeNinja

如果我看看錯誤信息，我認爲問題是價值，而不是關鍵。（\ u017b是在價值）

所以還必須對結果進行編碼：

print proper_names[row[1].decode('utf-8')].encode('utf-8')

（編輯：修正，以解決未來的參考意見）

來源

2015-09-24 11:22:34 Pieter21

引發KeyError：'\ xef \ xbf \ xbdurawska A.' – CodeNinja

'print proper_names [row [1] .decode（'utf-8'）]。encode（'utf-8'）'< - 這是正確答案 – CodeNinja

我能重現錯誤並確定它發生的位置。實際上，使用unicode鍵的字典不會造成問題，當您嘗試打印無法用ascii表示的unicode字符時會發生錯誤。如果將打印分爲兩行：

for row in reader: 
    val = proper_names[row[1].decode('utf-8')] 
    print val

錯誤將發生在print行。

您必須使用正確的字符集對其進行編碼。一個我知道的最好的是latin1的，但它不能代表\ u017b，所以我再次使用UTF8：

for row in reader: 
    val = proper_names[row[1].decode('utf-8')] 
    print val.encode('utf8')

或直接

for row in reader: 
    print proper_names[row[1].decode('utf-8')].encode('utf8')

來源

2015-09-24 11:46:06

帶有Unicode字符的字典鍵提示錯誤

回答

相關問題