python字符串編碼問題

在python中是否有一個函數等同於用'u'作爲字符串的前綴？python字符串編碼問題

比方說，我有一個字符串：

a = 'C\xc3\xa9dric Roger'

，我想將其轉換爲：

b = u'C\xc3\xa9dric Roger'

，這樣我可以把它比作其他Unicode對象。我怎樣才能做到這一點？我的第一個直覺是嘗試：

>>>> b = unicode(a) 
Traceback (most recent call last): 
File "<string>", line 1, in <fragment> 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

但這似乎是試圖解碼字符串。沒有進行任何類型的解碼，是否有一個轉換爲unicode的函數？（那是什麼 'U' 字頭確實還是我誤解？）

來源

2013-12-19 John Greenall

你需要指定編碼：

unicode(a, 'utf8')

，或者使用str.decode()：

a.decode('utf8')

但做挑正確的編解碼器爲您的輸入;你在這裏顯然有UTF-8數據，但並不總是如此。

要理解這是什麼一樣，我強烈推薦您閱讀：

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)由Joel Spolsky的
的Python Unicode HOWTO
Pragmatic Unicode由斯內德爾德

來源

2013-12-19 16:54:47

對不起如果我在這裏很愚蠢但unicode（'C \ xc3 \ xa9dric Roger'，'utf8'）不會產生u'C \ xc3 \ xa9dric Roger'... –

@JohnGreenall：不，因爲您現在有* Unicode *值; C3 A9是Unicode標準中的U + 00E9代碼點的UTF-8編碼，也就是拉丁文中帶有ACUTE的小寫字母E.當表示unicode字符串時，Python將顯示爲'u'\ xe9'。 –

@JohnGreenall：再次，*請*閱讀我的答案中包含的鏈接，這裏有一些基本概念需要了解。 –

python字符串編碼問題

回答

相關問題