計數唯一單詞在Python

在直接，到目前爲止我的代碼是這樣的：計數唯一單詞在Python

from glob import glob 
pattern = "D:\\report\\shakeall\\*.txt" 
filelist = glob(pattern) 
def countwords(fp): 
    with open(fp) as fh: 
     return len(fh.read().split()) 
print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern

我想補充一點，從計算模式的獨特字（42 txt文件在這條道路），一個代碼，但我不知識。有誰能夠幫助我？

來源

2012-08-10 rocksland

通過唯一字，你的意思是說只有一次事件的詞，或者你的意思是你想要每一個詞的次數？ – 2012-08-10 10:37:38

算在Python對象使用collections.Counter最好的辦法是類，它是爲此目的而創建的。它的行爲像一個Python字典，但在計數時使用起來更容易一些。您只需傳遞一個對象列表，並自動爲您計算它們。

>>> from collections import Counter 
>>> c = Counter(['hello', 'hello', 1]) 
>>> print c 
Counter({'hello': 2, 1: 1})

還計數器有一些有用的方法like most_common，請訪問documentation瞭解更多。

Counter類的一個方法也可以是非常有用的更新方法。你通過傳遞對象的列表實例計數器後，您可以使用更新方法做同樣的，它會繼續計數不丟棄舊計數器對象：

>>> from collections import Counter 
>>> c = Counter(['hello', 'hello', 1]) 
>>> print c 
Counter({'hello': 2, 1: 1}) 
>>> c.update(['hello']) 
>>> print c 
Counter({'hello': 3, 1: 1})

來源

2012-08-10 10:43:09

看來我已經發布了一個與你非常相似的答案。我正在刪除我的，但我建議你添加一個'Counter'對象的'update（）'方法。 – 2012-08-10 10:54:19

非常感謝 – rocksland 2012-08-10 10:57:08

如果你想獲得每一個獨特的字數，然後使用類型的字典：

words = ['Hello', 'world', 'world'] 
count = {} 
for word in words : 
    if word in count : 
     count[word] += 1 
    else: 
     count[word] = 1

，你會得到字典

{'Hello': 1, 'world': 2}

來源

2012-08-10 10:36:32

計數在哪裏？ – 2012-08-10 10:37:13

另外，'set（）'會是更好的選擇。 – 2012-08-10 10:38:16

'len（unique（words））'當然是 – 2012-08-10 10:38:31

print len(set(w.lower() for w in open('filename.dat').read().split()))

讀取整個文件到存儲器，將其分解成使用空白即，轉換每個字爲小寫，創建（唯一的）從小寫字設置，計數他們並打印輸出

來源

2012-08-10 10:43:17

計數唯一單詞在Python

回答

相關問題