確定相對字母頻率

我需要創建一個函數，該函數將文本文件作爲輸入並返回大小爲26的矢量，頻率以每個字符（a到z）的百分比表示。這必須對大小寫不敏感。所有其他字母（例如å）和符號應該被忽略。確定相對字母頻率

我試過使用這裏的一些答案，特別是'雅各'的答案。 Determining Letter Frequency Of Cipher Text

這是我到目前爲止的代碼：

def letterFrequency(filename): 
    #f: the text file is converted to lowercase 
    f=filename.lower() 
    #n: the sum of the letters in the text file 
    n=float(len(f)) 
    import collections 
    dic=collections.defaultdict(int) 
    #the absolute frequencies 
    for x in f: 
     dic[x]+=1 
    #the relative frequencies 
    from string import ascii_lowercase 
    for x in ascii_lowercase: 
     return x,(dic[x]/n)*100

例如，如果我試試這個：

print(letterFrequency('I have no idea')) 
>>> ('a',14.285714)

爲什麼它不能打印的字母都相對值？還有不在字符串中的字母，比如我的例子中的z？

以及如何讓我的代碼打印大小爲26的矢量？

編輯：我試過使用計數器，但它打印（'a'：14.2857）和字母混合順序。我只需要按順序排列字母的相對頻率！

來源

2015-06-14 Gliz

for x in ascii_lowercase: 
    return x,(dic[x]/n)*100

該函數在循環的第一次迭代中返回。

取而代之的是，將其更改爲返回元組的列表：

letters = [] 
for x in ascii_lowercase: 
    letters.append((x,(dic[x]/n)*100)) 
return letters

來源

2015-06-14 12:57:33

謝謝你，這個工作..但我如何刪除在打印結果中的逗號？它打印[數字，數字，數字]，但我真的想得到[數字號碼]像數組 – Gliz

@Gliz使用'letters.append（（dic [x]/n）* 100）'並打印它使用'for e in letterFrequency（'我不知道'）：print（e，end =''）'。那是你要的嗎？ –

的問題是，在你的for循環：

for x in ascii_lowercase: 
    return x,(dic[x]/n)*100

返回一個元組，所以它會在第一站迭代。

使用yield而不是return，這將變成一個按預期工作的發電機。

也是另一種方法，使其工作是返回一個列表理解：

return [x,(dic[x]/n)*100 for x in ascii_lowercase]

但是，如果你的目的是計算項目，我電子書籍使用Counter類：

def letterFrequency(txt): 
    from collections import Counter 
    from string import ascii_lowercase 
    c=Counter(txt.lower()) 
    n=len(txt)/100. 
    return [(x, c[x]/n) for x in ascii_lowercase]

正如你所看到的，c=Counter(txt.lower())使得迭代字符和保持計數的所有工作。該計數器的行爲就像一個defaultdict。

注意Counter也已經不錯usefult方法，如c.most_common() ...

來源

2015-06-14 13:02:29 fferri

確定相對字母頻率

回答

相關問題