解碼在Python

一個編碼的Unicode字符串我需要解碼「UNICODE」編碼字符串：解碼在Python

>>> id = u'abcdß' 
>>> encoded_id = id.encode('utf-8') 
>>> encoded_id 
'abcd\xc3\x9f'

我的問題是：使用塔路由，我得到的encoded_id變量作爲unicode字符串u'abcd\xc3\x9f'代替只是一個常規字符串'abcd\xc3\x9f'：

使用python，我該如何解碼我的encoded_id變量是一個unicode字符串？

>>> encoded_id = u'abcd\xc3\x9f' 
>>> encoded_id.decode('utf-8') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/home/test/vng/lib64/python2.6/encodings/utf_8.py", line 16, in   decode 
return codecs.utf_8_decode(input, errors, True) 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-5: ordinal not in range(128)

來源

2013-09-27 alloyoussef

如果可能的話，你應該弄清楚爲什麼你從主塔incorreclty得到的字符串解碼爲'拉丁1'（或它的近親，'Windows的1252'），而不是'UTF -8'開頭。 –

你有UTF-8編碼數據（沒有UNICODE編碼數據這樣的東西）。

編碼的Unicode值，以拉丁語-1，然後從UTF8解碼：

encoded_id.encode('latin1').decode('utf8')

拉丁1映射前255個點的unicode單對一個以字節。

演示：

>>> encoded_id = u'abcd\xc3\x9f' 
>>> encoded_id.encode('latin1').decode('utf8') 
u'abcd\xdf' 
>>> print encoded_id.encode('latin1').decode('utf8') 
abcdß

來源

2013-09-27 17:42:40

回答

相關問題