打印10個最常用的單詞

該程序試圖在文件中打印最常用的10個單詞。但我無法打印的10個最常用的詞打印10個最常用的單詞

from string import * 
file = open('shakespeare.txt').read().lower().split() 

number_of_words = 0 

onlyOneWord = [] 

for i in file: 
    if i in onlyOneWord: continue 
    else: onlyOneWord.append(i) 
lot_of_words = {} 


for all_Words in onlyOneWord: 
    all_Words = all_Words.strip(punctuation) 
    number_of_words = 0 
    for orignal_file in file: 
     orignal_file = orignal_file.strip(punctuation) 
     if all_Words == orignal_file: 
      number_of_words += 1 
     lot_of_words[all_Words] = number_of_words 

for x,y in sorted(lot_of_words.items()): 
    print(max(y))

現在它將打印什麼是完整的文件

我需要它來打印10個最常用的詞這樣並使其運行速度快了很多

的：251 蘋果：234 等

來源

2017-12-02 Abdul Alkh

請格式化您的代碼。 – enyard

如果我理解你的問題，你只需要使用不同的方法打印看看這個：https://stackoverflow.com/questions/7197315/5-maximum-values-in-a-python-dictionary關於製作它運行得更快，你需要優化你的算法（即遍歷整個文件的次數更少）。 – DoesData

您可以使用collections.Counter.most_common輕鬆完成此操作。我還使用str.translate刪除標點符號。

from collections import Counter 
from string import punctuation 

strip_punc = str.maketrans('', '', punctuation) 

with open('shakespeare.txt') as f: 
    wordCount = Counter(f.read().lower().translate(strip_punc).split()) 

print(wordCount.most_common(10))

將打印元組

列表

[('the', 251), ('apple', 100), ...]

編輯：我們可能會使用相同的translate電話，我們用它來去除標點符號改變字母的大小寫加快這

from string import punctuation, ascii_uppercase, ascii_lowercase 

strip_punc = str.maketrans(ascii_lowercase, ascii_uppercase, punctuation)

來源

2017-12-02 21:59:10

請注意，這也將刪除單詞內部的撇號和連字符，OP可能不需要。 –

的確如此。在這種情況下，你可能想要映射（lambda x：x.strip（標點符號），f.read（）。lower（）。split（））' –

打印10個最常用的單詞

回答

相關問題