LEN（）與Unicode字符串

print "\xE2\x82\xAC" 
print len("€") 
print len(u"€")

我得到：

€ 
3 
1

但是，如果我做的：

print '\xf0\xa4\xad\xa2' 
print len("") 
print len(u"")

我得到：


4 
2

在第二個示例中，對於一個字符unicode字符串u「」，len（）函數返回2而不是1。

有人可以向我解釋爲什麼會出現這種情況嗎？

Python 2可以使用UTF-16作爲unicode對象的內部編碼（所謂的「窄」構建），這意味着被編碼爲兩個代理：D852 DF62。在這種情況下，len返回的是UTF-16單元的數量，而不是實際的Unicode碼點的數量。

Python 2中也可以與用於unicode（所謂的「寬」的構建）啓用UTF-32，這意味着最unicode對象採取兩倍存儲器編譯，但隨後len(u'') == 1

Python 3中的str對象，因爲3.3在ISO-8859-1，UTF-16和UTF-32之間切換需求，所以你永遠不會遇到這個問題：len('') == 1。

str在Python 3.0到3.2是在Python 2

2014-07-19 17:44:19

一樣unicode我怎麼能循環通過包含這種編碼的Unicode字符字符串？有些東西像你「」。 – lessthanl0l

@ lessthanl0l：嘗試像這樣：http://stackoverflow.com/questions/7494064/how-to-iterate-over-unicode-characters-in-python-3 –

回答