字數不會打印外來字符

如何設置它來打印中文和重音字符？字數不會打印外來字符

from twill.commands import * 
from collections import Counter 

with open('names.txt') as inf: 
    words = (line.strip() for line in inf) 
    freqs = Counter(words) 
    print (freqs)

來源

2012-11-11 muchacho

哪個Python版本您使用的？ – Blckknght

http://stackoverflow.com/questions/3883573/encoding-error-in-python-with-chinese-characters – Sheena

與中國人物我會使用codecs.open而不是純open正確對待，並把它傳遞正確的編碼的文件。

例如，如果你有一個文件「unicode.txt」包含字符串「aèioሴሴ」：

>>> open('unicode.txt').read() # has utf-8 BOM 
'\xef\xbb\xbfa\xc3\xa8io\xe1\x88\xb4 \xe1\x88\xb4' 
>>> codecs.open('unicode.txt').read() #without encoding is the same as open 
'\xef\xbb\xbfa\xc3\xa8io\xe1\x88\xb4 \xe1\x88\xb4' 
>>> codecs.open('unicode.txt', encoding='utf-8').read() 
u'\ufeffa\xe8io\u1234 \u1234'

而對於Counter是你獲得：

>>> Counter(open('unicode.txt').read()) 
Counter({'\xe1': 2, '\x88': 2, '\xb4': 2, 'a': 1, '\xc3': 1, ' ': 1, 'i': 1, '\xa8': 1, '\xef': 1, 'o': 1, '\xbb': 1, '\xbf': 1}) 
>>> Counter(codecs.open('unicode.txt', encoding='utf-8').read()) 
Counter({u'\u1234': 2, u'a': 1, u' ': 1, u'i': 1, u'\xe8': 1, u'o': 1, u'\ufeff': 1})

如果爲「我怎樣才能設置它打印中文字符」，你的意思是print(freqs)應該顯示Counter({'不': 1})之類的東西，那麼這在python2中是不可能的，而它是python3上的默認值。

在python2的Counter的__str__方法類的__repr__串的方法，因此你總是看到類似\u40ed，而不是真正的性格：

>>> Counter(u'不') 
Counter({u'\u4e0d': 1}) 
>>> repr(u'不') 
"u'\\u4e0d'"

在python3所有字符串都是Unicode和「不」的repr是「‘不’」：

>>> Counter('不') 
Counter({'不': 1}) 
>>> repr('不') 
"'不'"

所以，如果你想使用這兩種python2有效的解決方案和python3您應該創建一個功能str_counter，在python3剛剛返回Counter的str，而在python2必須遍歷鍵值對，並建立字符串表示本身：

>>> def str_counter(counter): 
...  if sys.version_info.major > 2: 
...   # python3, no need to do anything 
...   return str(counter) 
...  # python2: we manually create a unicode representation. 
...  result = u'{%s}' 
...  parts = [u'%s: %s' % (unicode(key), unicode(value)) for key, value in counter.items()] 
...  return result % u', '.join(parts) 
... 
>>> print str_counter(Counter(u'不')) # python2 
{不: 1}

來源

2012-11-11 08:40:57 Bakuriu

字數不會打印外來字符

回答

相關問題