2011-05-08 121 views
0
s=array1 #user inputs an array with text in it 
n=len(s) 
f=arange(0,26,1) 
import collections 
dict = collections.defaultdict(int) 
for c in s: 
    dict[c] += 1 

for c in f: 
    print c,dict[c]/float(n) 

在輸出中,c是數字而不是字母,我不知道如何將其轉換回字母。Python中的頻率分析 - 使用頻率而不是數字打印字母

此外,是否有任何方法將頻率/字母放入數組,以便可以將它們繪製在直方圖中?

+0

什麼是IntArrayToText調用?它是一個字符串嗎? – 2011-05-08 03:48:52

回答

1

要將一個數轉換爲它所代表的字母,只需使用內置chr

>>> chr(98) 
'b' 
>>> chr(66) 
'B' 
>>> 
4

應該指出的是,你是不是叫map用正確類型的參數(因此TypeError)。它需要一個函數和一個或多個迭代器,函數將應用於該函數。你的第二個參數是toChar [i],這將是一個字符串。所有迭代實現__iter__。爲了說明:

>>> l, t = [],() 
>>> l.__iter__ 
<<< <method-wrapper '__iter__' of list object at 0x7ebcd6ac> 
>>> t.__iter__ 
<<< <method-wrapper '__iter__' of tuple object at 0x7ef6102c> 

DTing's answer提醒我的collections.Counter

>>> from collections import Counter 
>>> a = 'asdfbasdfezadfweradf' 
>>> dict((k, float(v)/len(a)) for k,v in Counter(a).most_common()) 
<<< 
{'a': 0.2, 
'b': 0.05, 
'd': 0.2, 
'e': 0.1, 
'f': 0.2, 
'r': 0.05, 
's': 0.1, 
'w': 0.05, 
'z': 0.05} 
+0

+1我從來沒有使用過,謝謝! =) – DTing 2011-05-08 05:21:50

1
>>> a = "asdfbasdfezadfweradf" 
>>> import collections 
>>> counts = collections.defaultdict(int) 
>>> for letter in a: 
...  counts[letter]+=1 
... 
>>> print counts 
defaultdict(<type 'int'>, {'a': 4, 'b': 1, 'e': 2, 'd': 4, 'f': 4, 's': 2, 'r': 1, 'w': 1, 'z': 1}) 
>>> hist = dict((k, float(v)/len(a)) for k,v in counts.iteritems()) 
>>> print hist 
{'a': 0.2, 'b': 0.05, 'e': 0.1, 'd': 0.2, 'f': 0.2, 's': 0.1, 'r': 0.05, 'w': 0.05, 'z': 0.05} 
+1

不錯!讓我想起'collections.Counter'。 – zeekay 2011-05-08 05:03:05

0

到頻率/字母轉換成數組:

hisArray = [dict[c]/float(n) for c in f] 
3

如果您正在使用Python 2.7或更高您可以使用collections.Counter

的Python 2.7+

>>> import collections 
>>> s = "I want to count frequencies." 
>>> counter = collections.Counter(s) 
>>> counter 
Counter({' ': 4, 'e': 3, 'n': 3, 't': 3, 'c': 2, 'o': 2, 'u': 2, 'a': 1, 'f': 1, 'I': 1,  'q': 1, 'i': 1, 's': 1, 'r': 1, 'w': 1, '.': 1}) 
>>> n = sum(counter.values()) * 1.0 # Convert to float so division returns float. 
>>> n 
28 
>>> [(char, count/n) for char, count in counter.most_common()] 
[(' ', 0.14285714285714285), ('e', 0.10714285714285714), ('n', 0.10714285714285714), ('t', 0.10714285714285714), ('c', 0.07142857142857142), ('o', 0.07142857142857142), ('u', 0.07142857142857142), ('a', 0.03571428571428571), ('f', 0.03571428571428571), ('I', 0.03571428571428571), ('q', 0.03571428571428571), ('i', 0.03571428571428571), ('s', 0.03571428571428571), ('r', 0.03571428571428571), ('w', 0.03571428571428571), ('.', 0.03571428571428571)] 

的Python 3+

>>> import collections 
>>> s = "I want to count frequencies." 
>>> counter = collections.Counter(s) 
>>> counter 
Counter({' ': 4, 'e': 3, 'n': 3, 't': 3, 'c': 2, 'o': 2, 'u': 2, 'a': 1, 'f': 1, 'I': 1,  'q': 1, 'i': 1, 's': 1, 'r': 1, 'w': 1, '.': 1}) 
>>> n = sum(counter.values()) 
>>> n 
28 
>>> [(char, count/n) for char, count in counter.most_common()] 
[(' ', 0.14285714285714285), ('e', 0.10714285714285714), ('n', 0.10714285714285714), ('t', 0.10714285714285714), ('c', 0.07142857142857142), ('o', 0.07142857142857142), ('u', 0.07142857142857142), ('a', 0.03571428571428571), ('f', 0.03571428571428571), ('I', 0.03571428571428571), ('q', 0.03571428571428571), ('i', 0.03571428571428571), ('s', 0.03571428571428571), ('r', 0.03571428571428571), ('w', 0.03571428571428571), ('.', 0.03571428571428571)] 

這也將在按頻率的降序返回(炭,頻率)元組。