在Python 2和Python 3中獲取相同的Unicode字符串長度？

呃，Python的2/3的是如此令人沮喪......考慮這個例子，test.py：在Python 2和Python 3中獲取相同的Unicode字符串長度？

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import sys 
if sys.version_info[0] < 3: 
    text_type = unicode 
    binary_type = str 
    def b(x): 
    return x 
    def u(x): 
    return unicode(x, "utf-8") 
else: 
    text_type = str 
    binary_type = bytes 
    import codecs 
    def b(x): 
    return codecs.latin_1_encode(x)[0] 
    def u(x): 
    return x 

tstr = " ▲ " 

sys.stderr.write(tstr) 
sys.stderr.write("\n") 
sys.stderr.write(str(len(tstr))) 
sys.stderr.write("\n")

運行它：

$ python2.7 test.py 
▲ 
5 
$ python3.2 test.py 
▲ 
3

太好了，我得到兩個不同的字符串大小。希望將字符串包裝在我在網上發現的其中一個包裝中會有幫助？

tstr = text_type(" ▲ ")對於：

$ python2.7 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = text_type(" ▲ ") 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) 
$ python3.2 test.py 
▲ 
3

對於tstr = u(" ▲ ")：

$ python2.7 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = u(" ▲ ") 
    File "test.py", line 11, in u 
    return unicode(x) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) 
$ python3.2 test.py 
▲ 
3

對於tstr = b(" ▲ ")：

$ python2.7 test.py 
▲ 
5 
$ python3.2 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = b(" ▲ ") 
    File "test.py", line 17, in b 
    return codecs.latin_1_encode(x)[0] 
UnicodeEncodeError: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)

對於tstr = binary_type(" ▲ ")：

$ python2.7 test.py 
▲ 
5 
$ python3.2 test.py 
Traceback (most recent call last): 
    File "test.py", line 21, in <module> 
    tstr = binary_type(" ▲ ") 
TypeError: string argument without an encoding

那麼，這當然會讓事情變得簡單。

那麼，如何在Python 2.7和3.2中獲得相同的字符串長度（本例中爲3）呢？

來源

2013-05-10 sdaau

嘛，原來unicode()在Python 2.7有encoding說法，那顯然有助於：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import sys 
if sys.version_info[0] < 3: 
    text_type = unicode 
    binary_type = str 
    def b(x): 
    return x 
    def u(x): 
    return unicode(x, "utf-8") 
else: 
    text_type = str 
    binary_type = bytes 
    import codecs 
    def b(x): 
    return codecs.latin_1_encode(x)[0] 
    def u(x): 
    return x 

tstr = u(" ▲ ") 

sys.stderr.write(tstr) 
sys.stderr.write("\n") 
sys.stderr.write(str(len(tstr))) 
sys.stderr.write("\n")

運行，我得到我需要的東西：

$ python2.7 test.py 
▲ 
3 
$ python3.2 test.py 
▲ 
3

來源

2013-05-10 06:40:26 sdaau

在Python 2和Python 3中獲取相同的Unicode字符串長度？

回答

相關問題