2011-12-22 48 views
1

我用下面的代碼獲取的信件頻率在文本:如何將累積輸出存儲在列表中?

for s in 'abcdefghijklmnopqrstuvwxyz ': 
    count = 0 
    for char in rawpunct.lower(): 
     if s == char: 
      count +=1 
    result = s, '%.3f' % (count*100/len(rawpunct.lower())) 
    f_list.append(result) 

,其結果是:

['0.061', '0.012', '0.017', '0.030', '0.093', '0.016', '0.016', 
'0.049', '0.050', '0.001', '0.006', '0.034', '0.018', '0.052', '0.055', 
'0.013', '0.001', '0.041', '0.050', '0.069', '0.021', '0.007', '0.017', 
'0.001', '0.013', '0.000', '0.159'] 

,但我想存儲的累積頻率,即創建這個列表:

['0.061', '0.073', '0.100', '0.130' ............ ] 

任何人都知道該怎麼做?

+1

這不是你問的問題;但是請注意,這可以通讀整個文本27次,只要通過只讀一遍即可獲得相同的結果。簡單地創建一個將字符映射到如下所示的字典:對於'abcdefghijklmnopqrstuvwxyz'中的'counts = {a = 0,b:0 ...',或者等同於'counts = dict((c,0))通過文本一次;對於文本中的每個「c」,執行這個計數[c] + = 1',然後最後可以使用下面描述的方法創建一個新的累積列表 – senderle 2011-12-22 15:53:54

+0

也可用於像這樣操作:['defaultdict'](http://docs.python.org/library/collections.html#collections.defaultdict)和['Counter'](http://docs.python.org/library/collections的.html#collections.Counter)。 – senderle 2011-12-22 15:54:04

回答

2

只爲一個班輪的樂趣:

original = ['0.061', '0.012', '0.017', '0.030', '0.093', '0.016', '0.016', 
'0.049', '0.050', '0.001', '0.006', '0.034', '0.018', '0.052', '0.055', 
'0.013', '0.001', '0.041', '0.050', '0.069', '0.021', '0.007', '0.017', 
'0.001', '0.013', '0.000', '0.159'] 

result = [sum(float(item) for item in original[0:rank+1]) for rank in xrange(len(original))] 

>>> [0.061, 0.073, 0.09, 0.12, 0.213, 0.22899999999999998, 0.245, 0.294, 0.344, 0.345, 0.351, 0.385, 0.403, 0.455, 0.51, 0.523, 0.524, 0.5650000000000001, 0.6150000000000001, 0.6840000000000002, 0.7050000000000002, 0.7120000000000002, 0.7290000000000002, 0.7300000000000002, 0.7430000000000002, 0.7430000000000002, 0.9020000000000002] 
1
if len(f_list) == 0: 
    f_list.append(result) 
else: 
    f_list.append(f_list[-1] + result) 
1
f_list = [0] 
for s in 'abcdefghijklmnopqrstuvwxyz ': 
    count = 0 
    for char in rawpunct.lower(): 
     if s == char: 
      count +=1 
    result = s, '%.3f' % (count*100/len(rawpunct.lower())) 
    f_list.append(result + f_list[-1]) 
f_list = list(f_list[1:]) 
2
letters = 'abcdefghijklmnopqrstuvwxyz ' 
counts = dict.fromkeys(letters, 0) 
for char in rawpunct.lower(): 
    try: 
     counts[char] += 1 
    except KeyError: 
     pass 
     # this character in rawpunct should not be counted! 
f_list = [0] 
for s in letters: 
    f_list.append(f_list[-1] + counts[s]) 
str_list = ['{0:.3f}'.format(f) for f in f_list[1:]] 

f_list是浮動的列表(這是比較容易計算與彩車比用字符串表示的款項!)。最後,我創建了str_list,這是這些浮點數的字符串表示列表。既然你不想用零開始你的列表,這將在最後被刪除(只有f_list[1:]被採用)。

如果您的輸入文本很長,此解決方案速度更快,因爲它只讀取一次。

3

您可以使用import numpy ,然後作出導致數組results=numpy.array(result) 終於 'f_list=numpy.cumsum(results)'

0

cumsum版,採用reduce

In [1]: x = [1,2,3] 
In [2]: reduce(lambda acc, x: acc + [acc[-1] + x], x[1:], x[:1]) 
Out[2]: [1, 3, 6] 

它適用於空手道y列表:

In [3]: x = [] 
In [4]: reduce(lambda acc, x: acc + [acc[-1] + x], x[1:], x[:1]) 
Out[4]: [] 
0

我想rawpunct是包含你的文本的字符串。我用我的建議中的文字替換它:

from string import lowercase 

text='Some arbitrary Text with NonNSense! @#!.+-'.lower() 
chmap = lowercase+' ' 
cooked_text = ''.join([i for i in text if i in chmap]) 
chdict = dict.fromkeys(chmap, 0)  #set totals-dict up 
frequencies = dict.fromkeys(chmap, 0) #set fractions dict up 

for ch in cooked_text: #toals per char 
    chdict[ch] += 1 

for char in chdict.keys(): #relative to text-length 
    frequencies[char] = float(chdict[char])/len(cooked_text) 

frequency_list = [frequencies[char] for char in chmap] 
frequency_strlist = ['%.3f' % f for f in frequency_list] 
print frequency_strlist