計算文本文件中字母的頻率

在python中，如何遍歷文本文件並計算每個字母的出現次數？我意識到我可以使用'for x in file'語句來通過它，然後設置26個如果elif語句，但肯定有更好的方法來做到這一點？計算文本文件中字母的頻率

感謝。

2012-09-09 Muzz5

http://stackoverflow.com/search指望其他東西？q = [python] + count –

[Here]（http://stackoverflow.com/a/5148987/866571）是類似的問題。 – Mayura

可能的重複[字母頻率在蟒蛇]（http://stackoverflow.com/questions/5148903/letter-frequency-in-python） –

使用collections.Counter()：

from collections import Counter 
with open(file) as f: 
    c = Counter() 
    for x in f: 
     c += Counter(x.strip())

由於@mgilson指出，在情況下，如果該文件是沒有那麼大，你可以簡單地做：

c = Counter(f.read().strip())

示例：

>>> c = Counter() 
>>> c += Counter('aaabbbcccddd eee fff ggg') 
>>> c 
Counter({'a': 3, ' ': 3, 'c': 3, 'b': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3}) 
>>> c += Counter('aaabbbccc') 
Counter({'a': 6, 'c': 6, 'b': 6, ' ': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})

或使用字符串的方法count()：

from string import ascii_lowercase  # ascii_lowercase =='abcdefghijklmnopqrstuvwxyz' 
with open(file) as f: 
    text = f.read().strip() 
    dic = {} 
    for x in ascii_lowercase: 
     dic[x] = text.count(x)

來源

2012-09-09 19:26:59

爲此事：'計數器（f.read（））'應該做的伎倆如果OP可以將整個文件讀入內存。 – mgilson

運行精美。謝謝！然而，Counter（f.read（））方法會拋出一些關於混合數據的錯誤。我的文件長度可能大約爲1000個字符，所以大小不應該成爲問題。 – Muzz5

使用字典 - 基本上letters[char]++

來源

2012-09-09 19:26:29 djechlin

計數器是做到這一點的好辦法，但計數器僅在3.1及以上，加上2.7可用。

如果你在3.0或者2. [56]，你可能應該使用collections.defaultdict（int）來代替。

來源

2012-09-09 19:38:25 dstromberg

這種方式可以爲每個字符創建一個字典直方圖，該字符可用於創建條形圖或類似圖表。如果您想將其限制爲字母或子集，則需要添加一個附加條件，或者在末尾過濾掉freqs。

freqs = {} 
for line in file_list: 
    for char in line: 
     if char in freqs: 
      freqs[char] += 1 
     else: 
      freqs[char] = 1 

print freqs

我假設你已經打開文件，並填充* file_list *的內容。

來源

2012-09-09 19:40:18

'has_key（）'已被棄用，請使用'in'。 –

@ james-bradbury它應該是'如果char在freqs.keys（）'而不是'如果字符在freqs'中。 – MaxMarchuk

@MaxMarchuk。如果我們正在談論Python 2.x，那麼您是正確的，但在Python 3中，您可以使用更簡單，更易讀的表單來遍歷鍵。 –

基本上沒有進口： is_letter是一個函數來決定，如果事情是字母，這樣就可以比一般的英文字母

def add_or_init(dictionary, c): 
     if(c in dictionary): 
       dictionary[c]+=1 
     else: 
       dictionary[c]=1 
def count_one_letter(dictionary, c, is_letter): 
     if is_letter(c): 
       add_or_init(dictionary, c) 
def count_letters(dictionary, string, is_letter): 
     for c in string: 
       count_one_letter(dictionary, c, is_letter) 
     return dictionary 

#count all characters 
count_letters(dict(),'aaabbbcccddd eee fff ggg',lambda x: True) 
# => {'a': 3, ' ': 3, 'c': 3, 'b': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3}

來源

2012-09-09 19:53:41 user1651640

計算文本文件中字母的頻率

回答

相關問題