2
(我是蟒蛇2.7)如何正確製表Unicode數據
我有這樣的測試:
# -*- coding: utf-8 -*-
import binascii
test_cases = [
'aaaaa', # Normal bytestring
'ááááá', # Normal bytestring, but with extended ascii. Since the file is utf-8 encoded, this is utf-8 encoded
'ℕℤℚℝℂ', # Encoded unicode. The editor has encoded this, and it is defined as string, so it is left encoded by python
u'aaaaa', # unicode object. The string itself is utf-8 encoded, as defined in the "coding" directive at the top of the file
u'ááááá', # unicode object. The string itself is utf-8 encoded, as defined in the "coding" directive at the top of the file
u'ℕℤℚℝℂ', # unicode object. The string itself is utf-8 encoded, as defined in the "coding" directive at the top of the file
]
FORMAT = '%-20s -> %2d %-20s %-30s %-30s'
for data in test_cases :
try:
hexlified = binascii.hexlify(data)
except:
hexlified = None
print FORMAT % (data, len(data), type(data), hexlified, repr(data))
它產生的輸出:
aaaaa -> 5 <type 'str'> 6161616161 'aaaaa'
ááááá -> 10 <type 'str'> c3a1c3a1c3a1c3a1c3a1 '\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'
ℕℤℚℝℂ -> 15 <type 'str'> e28495e284a4e2849ae2849de28482 '\xe2\x84\x95\xe2\x84\xa4\xe2\x84\x9a\xe2\x84\x9d\xe2\x84\x82'
aaaaa -> 5 <type 'unicode'> 6161616161 u'aaaaa'
ááááá -> 5 <type 'unicode'> None u'\xe1\xe1\xe1\xe1\xe1'
ℕℤℚℝℂ -> 5 <type 'unicode'> None u'\u2115\u2124\u211a\u211d\u2102'
正如你所看到的,對於非ASCII字符的字符串,列沒有正確對齊。這是因爲這些字符串的長度(以字節爲單位)大於unicode字符的數量。如何告訴打印人員考慮字符數量,而不是填充字段時的字節數?
首先使用字符而不是字節。 –