Python中的Unicode的增加理解（2.7）

我觀察，在程序Python中的Unicode的增加理解（2.7）

# -*- coding: utf-8 -*- 
words = ['artists', 'Künstler', '藝術家', 'Митець'] 
for word in words: 
    print word, type(word)

不是絕對需要完全限定的字符串作爲unicode字符串：

words = ['artist', u'Künstler', u'藝術家', u'Митець']

不同字母處理得很好，沒有'u'前綴。

所以看起來，一旦指定編碼：utf-8，所有字符串都以Unicode編碼。真的嗎？

或者只有當字符串不再適合範圍（128）時才使用unicode？
爲什麼type(word)在所有情況下都報告<str>？是不是unicode一個特殊的數據類型？

來源

2016-02-25 Calaf

強制性閱讀：http://bit.ly/unipain – Daenyth

或許這將使它更加明確：

# -*- coding: utf-8 -*- 
words = ['artists', 'Künstler', '藝術家', 'Митець'] 
for word in words: 
    print word, type(word), repr(word) 
words = [u'artists', u'Künstler', u'藝術家', u'Митець'] 
for word in words: 
    print word, type(word), repr(word)

輸出：

artists <type 'str'> 'artists' 
Künstler <type 'str'> 'K\xc3\xbcnstler' 
藝術家 <type 'str'> '\xe8\x89\xba\xe6\x9c\xaf\xe5\xae\xb6' 
Митець <type 'str'> '\xd0\x9c\xd0\xb8\xd1\x82\xd0\xb5\xd1\x86\xd1\x8c' 
artists <type 'unicode'> u'artists' 
Künstler <type 'unicode'> u'K\xfcnstler' 
藝術家 <type 'unicode'> u'\u827a\u672f\u5bb6' 
Митець <type 'unicode'> u'\u041c\u0438\u0442\u0435\u0446\u044c'

在你在UTF-8的聲明信源編碼編碼的字節串尚屬首例。他們只能在UTF-8終端上正確顯示。

在第二種情況下，您將獲得Unicode字符串。它們將在任何編碼支持字符的終端上正確顯示。

這裏的字符串是如何顯示437的Windows代碼頁控制檯上，使用Python環境變量來配置Python來替換不支持的字符，而不是提高他們的默認UnicodeEncodeError例外：

c:\>set PYTHONIOENCODING=cp437:replace 
c:\>py -2 x.py 
artists <type 'str'> 'artists' 
K├╝nstler <type 'str'> 'K\xc3\xbcnstler' 
Φë║µ£»σ«╢ <type 'str'> '\xe8\x89\xba\xe6\x9c\xaf\xe5\xae\xb6' 
╨£╨╕╤é╨╡╤å╤î <type 'str'> '\xd0\x9c\xd0\xb8\xd1\x82\xd0\xb5\xd1\x86\xd1\x8c' 
artists <type 'unicode'> u'artists' 
Künstler <type 'unicode'> u'K\xfcnstler' 
??? <type 'unicode'> u'\u827a\u672f\u5bb6' 
?????? <type 'unicode'> u'\u041c\u0438\u0442\u0435\u0446\u044c'

字節串大多是垃圾，但Unicode字符串是明智的，因爲該代碼頁不支持中文和俄文。

來源

2016-02-25 17:32:57

不相關：Python可能使用Unicode API來[將Unicode字符串打印到Windows控制檯]（http://stackoverflow.com/a/32176732/4279）它可能與'chcp'返回的內容無關。 – jfs

因此，這似乎一旦編碼：UTF-8是規定，所有的字符串以Unicode進行編碼。真的嗎？

否。這意味着源代碼中的字節序列被解釋爲UTF-8。你已經創建了字節串，而且系統是天真地解釋它們的內容（而不是用u'...'創建文本）。

來源

2016-02-25 05:22:51

Python中的Unicode的增加理解（2.7）

回答

相關問題