2010-12-05 80 views
0

我正在使用任何.txt文件的哈夫曼編碼,所以首先我需要分析這個文本文件。我需要閱讀它,然後分析。 我需要像表格中的「退出」:讀取.txt文件並分析


letter |頻率(重複多少次)|霍夫曼碼(這個以後還會來)


我開始:

f = open('test.txt', 'r') #open test.tx 
for lines in f: 
    print lines   #to ensure if all work... 

如何訂購從文件中讀取字符,字母順序:

with open("test.txt") as f_in: 
    for line in f_in: 
     for char in line: 
      frequencies[char] += 1 

? ?非常感謝


Well I tried like this: 
frequencies = collections.defaultdict(int) 
with open("test.txt") as f_in: 
    for line in f_in: 
     for char in line: 
      frequencies[char] += 1 


frequencies = [(count, char) for char, count in frequencies.iteritems()] 
frequencies.sort(key=operator.itemgetter(1)) 

但是編譯器返回我的「錯誤」 在這裏輸入的代碼

我需要這個字母順序在for循環,沒有在頻率結束......

+0

任何反對重新簽署這項作業? – 2010-12-05 00:49:21

+0

看到我更新的答案。 – aaronasterling 2010-12-05 02:11:13

+1

我看到你試過的問題。最後兩行具有前導空格字符,並且沒有「導入集合」和「導入運算符」語句。修復這些,它應該工作正常。 – martineau 2010-12-05 02:43:14

回答

2

要獲得你的頻率表,我會使用defaultdict。這隻會迭代一次數據。

import collections 
import operator 

frequencies = collections.defaultdict(int) 
with open(filename) as f_in: 
    for line in f_in: 
     for char in line: 
      frequencies[char] += 1 


frequencies = [(count, char) for char, count in frequencies.iteritems()] 
frequencies.sort(key=operator.itemgetter(1)) 
0
with open('test.txt') as f: data = f.read() 
table = dict((c, data.count(c)) for c in set(data)) 
0

我使用collections.Counter()使這個解決方案:

import re 
import collections 


if __name__ == '__main__': 
    is_letter = re.compile('[A-Za-z]') 

    frequencies = collections.Counter() 
    with open(r'text.txt') as f_in: 
     for line in f_in: 
      for char in line: 
       if is_letter.match(char): 
        frequencies[char.lower()] += 1 

    # Sort characters 
    characters = [x[0] for x in frequencies.most_common()] 
    characters.sort() 
    for c in characters: 
     print c, '|', str(frequencies[c]) 

正則表達式is_letter用於篩選只有我們感興趣的人物 它使輸出看起來像這樣。

a | 177 
b | 29 
c | 7 
d | 167 
e | 374 
f | 58 
g | 100 
h | 44 
i | 135 
j | 21 
k | 64 
l | 125 
m | 85 
n | 191 
o | 105 
p | 34 
r | 185 
s | 130 
t | 146 
u | 34 
v | 68 
x | 1 
y | 14