Python：UnicodeEncodeError：'latin-1'編解碼器無法編碼字符

我在一個場景，我調用api並基於api的結果我調用數據庫中的每個記錄，我在api中。我的api調用返回字符串，當我使用api返回的數據庫調用時，對於某些元素，我得到以下錯誤。Python：UnicodeEncodeError：'latin-1'編解碼器無法編碼字符

Traceback (most recent call last): 
    File "TopLevelCategories.py", line 267, in <module> 
    cursor.execute(categoryQuery, {'title': startCategory}); 
    File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 158, in execute 
    query = query % db.literal(args) 
    File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 265, in literal 
    return self.escape(o, self.encoders) 
    File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 203, in unicode_literal 
    return db.literal(u.encode(unicode_literal.charset)) 
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 3: ordinal not in range(256)

我的代碼上面的錯誤所指的路段是：

  ...  
     for startCategory in value[0]: 
      categoryResults = [] 
      try: 
       categoryRow = "" 
       baseCategoryTree[startCategory] = [] 
       #print categoryQuery % {'title': startCategory}; 
       cursor.execute(categoryQuery, {'title': startCategory}) #unicode issue 
       done = False 
       cont...

做了一些谷歌搜索後，我嘗試了我的命令行下面來了解怎麼回事...

>>> import sys 
>>> u'\u2013'.encode('iso-8859-1') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 0: ordinal not in range(256) 
>>> u'\u2013'.encode('cp1252') 
'\x96' 
>>> '\u2013'.encode('cp1252') 
'\\u2013' 
>>> u'\u2013'.encode('cp1252') 
'\x96'

但我不確定解決方案將如何解決此問題。我也不知道encode('cp1252')背後的理論是什麼，如果我可以對我上面的嘗試做一些解釋，那將是非常好的。

來源

2011-11-28 Null-Hypothesis

的可能的複製[UnicodeEncodeError ：'拉丁-1'編解碼器不能編碼字符]（http://stackoverflow.com/questions/3942888/unicodeencodeerror-latin-1-codec-cant-encode-character） –

如果您需要Latin-1編碼，你有幾種選擇，以擺脫上述255的短破折號或其他代碼點（字符不包含在拉丁語-1）：

>>> u = u'hello\u2013world' 
>>> u.encode('latin-1', 'replace') # replace it with a question mark 
'hello?world' 
>>> u.encode('latin-1', 'ignore')  # ignore it 
'helloworld'

還是做你的自定義更換：

>>> u.replace(u'\u2013', '-').encode('latin-1') 
'hello-world'

如果你不要求輸出的Latin-1，那麼UTF-8是一種常見的首選。它是由W3C推薦，並很好地編碼所有Unicode代碼點：

>>> u.encode('utf-8') 
'hello\xe2\x80\x93world'

來源

2011-11-28 00:32:37

unicode字符u'\ 02013'是「en短劃線」。它包含在Windows-1252（cp1252）字符集（編碼x96）中，但不包含在Latin-1（iso-8859-1）字符集中。 Windows-1252字符集在x80-x9f中定義了更多的字符，其中包括短劃線。

解決方案是讓您選擇與Latin-1不同的目標字符集，例如Windows-1252或UTF-8，或用簡單的「 - 」替換短劃線。

來源

2011-11-28 00:25:06 Cito

u.encode('utf-8')其轉換成使用sys.stdout.buffer.write(bytes) 結帳的displayhook然後可以在stdout印刷字節上 https://docs.python.org/3/library/sys.html

來源

2017-09-20 22:33:47 PriyankaP

Python：UnicodeEncodeError：'latin-1'編解碼器無法編碼字符

回答

相關問題