如何將字符串轉換爲Python中的字節2

我知道這可能聽起來像一個重複的問題，但那是因爲我不知道如何正確描述這個問題。如何將字符串轉換爲Python中的字節2

出於某種原因，我得到了這樣一串unicode字符串的：

a = u'\xcb\xea'

正如你可以看到，它實際上是個字節的中國人物代表，編碼在gbk

>>> print(b'\xcb\xea'.decode('gbk')) 
歲

u'歲'是我需要什麼，但我不知道如何將u'\xcb\xea'轉換爲b'\xcb\xea'。
有什麼建議嗎？

來源

2014-05-07 laike9m

這不是一個真正的字節表示，它仍然是unicode codepoints。它們是錯誤的代碼點，因爲它從字節解碼，就好像它被編碼爲Latin-1。

編碼爲拉丁文1（其編碼點映射單對一個以字節爲單位），然後解碼爲GBK：

a.encode('latin1').decode('gbk')

演示：

>>> a = u'\xcb\xea' 
>>> a.encode('latin1').decode('gbk') 
u'\u5c81' 
>>> print a.encode('latin1').decode('gbk') 
歲

來源

2014-05-07 15:02:12

令人驚訝的是，這就是答案！ – laike9m

爲python2的simpliest的方法是使用在repr()：

>>> key_unicode = u'uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1' 
>>> key_ascii = 'uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1' 
>>> print(key_ascii) 
uuuu��_��9ԣ� 
>>> print(key_unicode) 
uuuuö_¡ë9Ô£Ñ 
>>> 
>>> # here is the save method for both string types: 
>>> print(repr(key_ascii).lstrip('u')[1:-1]) 
uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1 
>>> print(repr(key_unicode).lstrip('u')[1:-1]) 
uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1 
>>> # ____________WARNING!______________ 
>>> # if you will use jsut `str.strip('u\'\"')`, you will lose 
>>> # the "uuuu" (and quotes, if such are present) on sides of the string: 
>>> print(repr(key_unicode).strip('u\'\"')) 
\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1

對於python3使用str.encode()獲得字節類型。

>>> key = 'l\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1q\xf5L\xa9\xdd0\x90\x8b\xf5ht\x86za\x0e\x1b\xed\xb6(\xaa+' 
>>> key 
'lö\x9f_¡\x05ë9Ô£ÑqõL©Ý0\x90\x8bõht\x86za\x0e\x1bí¶(ª+' 
>>> print(key) 
lö_¡ë9Ô£ÑqõL©Ý0õhtzaí¶(ª+ 
>>> print(repr(key.encode()).lstrip('b')[1:-1]) 
l\xc3\xb6\xc2\x9f_\xc2\xa1\x05\xc3\xab9\xc3\x94\xc2\xa3\xc3\x91

來源

2016-02-02 16:08:28

如何將字符串轉換爲Python中的字節2

回答

相關問題