定義單詞爲2個字母或更多的蟒蛇2.6

我有一個python腳本，我正在寫一個班級任務，它計算文本文檔中前10個最頻繁的單詞，並顯示單詞及其頻率。我能夠讓腳本的這部分工作得很好，但是作業說一個單詞被定義爲2個字母或更多。由於某些原因，我似乎無法將單詞定義爲2個字母或更多，當我運行腳本時，什麼都不會發生。用你的腳本定義單詞爲2個字母或更多的蟒蛇2.6

# Most Frequent Words: 
from string import punctuation 
from collections import defaultdict 

def sort_words(x, y): 
    return cmp(x[1], y[1]) or cmp(y[0], x[0]) 

number = 10 
words = {} 

words_gen = (word.strip(punctuation).lower() for line in open("charactermask.txt") 
              for word in line.split()) 
words = defaultdict(int) 
for word in words_gen: 
    words[word] +=1 

letters = len(word) 

while letters >= 2: 
    top_words = sorted(words.iteritems(), 
         key=lambda(word, count): (-count, word))[:number] 

for word, frequency in top_words: 
    print "%s: %d" % (word, frequency)

來源

2012-09-17 Ty Bailey

我會重構代碼 ~~和使用 collections.Counter對象~~ ：

import collections 
import string 

with open("charactermask.txt") as f: 
    words = [x.strip(string.punctuation).lower() for x in f.read().split()] 

counter = collections.defaultdict(int): 
for word in words: 
    if len(word) >= 2: 
    counter[word] += 1

來源

2012-09-17 00:26:42 wim

collections.Counter對象不在python 2.6中提供。 –

哦，對，你可以使用'defaultdict（int）'，因爲你一直在那 – wim

我可以看到爲什麼這會起作用，但我實現了它，現在我回到根本沒有得到回報...... –

的一個問題是循環

while letters >= 2: 
    top_words = sorted(words.iteritems(), 
         key=lambda(word, count): (-count, word))[:number]

你是不是經過這裏的話循環;這個循環將永遠循環。您需要更改腳本，以便腳本的這部分實際上遍歷所有單詞。（另外，你可能會想改變while到if因爲你只需要一個代碼，每個字執行一次。）

來源

2012-09-17 00:14:29

我改變到現在爲止，現在我至少得到了單詞的歸還，但它仍然包括字母'a'作爲單詞。我怎樣才能讓這個迭代遍歷所有的單詞？ –

定義單詞爲2個字母或更多的蟒蛇2.6

回答

相關問題