Python - 計算文本文件中的單詞

我是Python新手，正在研究一個計算簡單文本文件中單詞實例的程序。程序和文本文件將從命令行讀取，因此我已將其包含到我的編程語法中以檢查命令行參數。代碼如下Python - 計算文本文件中的單詞

import sys 

count={} 

with open(sys.argv[1],'r') as f: 
    for line in f: 
     for word in line.split(): 
      if word not in count: 
       count[word] = 1 
      else: 
       count[word] += 1 

print(word,count[word]) 

file.close()

count是一個字典，用於存儲單詞和它們發生的次數。我希望能夠打印出每個單詞及其出現的次數，從大多數出現到最小出現次數。

我想知道我是否在正確的軌道上，如果我正確使用sys。謝謝！！

來源

2014-09-11 Delfino

看起來不錯，合理Pythonic。儘管在每一行結尾處理換行符，最後一個字符將是'\ n'，這會弄亂你的計數。你會希望在行[： - 1] .split（）：'或其他東西中使用'。 – 2014-09-11 03:05:56

@Gaz Davidson：'line.split（）'將清理所有的空白。 – 2014-09-11 03:30:56

你可能會喜歡使用re.findall（r'\ w +'，...）將事物分成單詞，因爲它不僅僅作爲分隔符來填充空格......查看[python docs中的這個例子]（https：/ /docs.python.org/2/library/collections.html#counter-objects） – reteptilian 2015-11-04 20:03:17

你對我的看法很好，也可以使用collections.Counter（假設你是python 2.7或更新的版本）來獲取更多信息，比如每個單詞的數量。我的解決方案看起來像這樣，可能有些改進。

import sys 
from collections import Counter 
lines = open(sys.argv[1], 'r').readlines() 
c = Counter() 
for line in lines: 
    for work in line.strip().split(): 
     c.update(work) 
for ind in c: 
    print ind, c[ind]

來源

2014-09-11 03:17:29

你最終print沒有一個循環，所以它只會打印計數你讀的最後一個字，它仍然爲word值。

此外，與with上下文管理器，您不需要close()文件句柄。

最後，正如評論中指出的那樣，您需要先刪除每個line的最終換行符，然後再輸入split。

對於這樣一個簡單的程序，它可能不值得麻煩，但是您可能想要查看Collections的defaultdict以避免在字典中初始化新密鑰的特殊情況。

來源

2014-09-11 03:40:31 tripleee

我剛剛注意到一個錯字：您打開文件爲f但您將其關閉爲file。正如tripleee所說，您不應該關閉在with聲明中打開的文件。此外，使用內置函數的名稱（如file或list）作爲您自己的標識符是不好的做法。有時它可以工作，但有時它會導致令人討厭的錯誤。閱讀你的代碼的人會感到困惑;語法高亮編輯器可以幫助避免這個小問題。

要打印在遞減計數的順序，你可以做這樣的事情在你的字典count數據：

items = count.items() 
items.sort(key=lambda (k,v): v, reverse=True) 
print '\n'.join('%s: %d' % (k, v) for k,v in items)

見Python庫參考，詳細瞭解了list.sort（）方法和其他方便字典方法。

來源

2014-09-11 04:02:12

我剛剛通過使用re庫來完成此操作。這是針對每行文本文件中的平均詞，但您必須找出每行的詞數。

import re 
#this program get the average number of words per line 
def main(): 
    try: 
     #get name of file 
     filename=input('Enter a filename:') 

     #open the file 
     infile=open(filename,'r') 

     #read file contents 
     contents=infile.read() 
     line = len(re.findall(r'\n', contents)) 
     count = len(re.findall(r'\w+', contents)) 
     average = count // line 

     #display fie contents 
     print(contents) 
     print('there is an average of', average, 'words per sentence') 

     #closse the file 
     infile.close() 
    except IOError: 
     print('An error oocurred when trying to read ') 
     print('the file',filename) 

#call main 
main()

來源

2017-11-13 01:10:11

Python - 計算文本文件中的單詞

回答

相關問題