我想使用Python獲取一組文檔的頻率分佈。我的代碼不工作,出於某種原因,併產生此錯誤:FreqDist使用NLTK
Traceback (most recent call last):
File "C:\Documents and Settings\aschein\Desktop\freqdist", line 32, in <module>
fd = FreqDist(corpus_text)
File "C:\Python26\lib\site-packages\nltk\probability.py", line 104, in __init__
self.update(samples)
File "C:\Python26\lib\site-packages\nltk\probability.py", line 472, in update
self.inc(sample, count=count)
File "C:\Python26\lib\site-packages\nltk\probability.py", line 120, in inc
self[sample] = self.get(sample,0) + count
TypeError: unhashable type: 'list'
你能幫忙嗎?
這是迄今爲止代碼:
import os
import nltk
from nltk.probability import FreqDist
#The stop=words list
stopwords_doc = open("C:\\Documents and Settings\\aschein\\My Documents\\stopwords.txt").read()
stopwords_list = stopwords_doc.split()
stopwords = nltk.Text(stopwords_list)
corpus = []
#Directory of documents
directory = "C:\\Documents and Settings\\aschein\\My Documents\\comments"
listing = os.listdir(directory)
#Append all documents in directory into a single 'document' (list)
for doc in listing:
doc_name = "C:\\Documents and Settings\\aschein\\My Documents\\comments\\" + doc
input = open(doc_name).read()
input = input.split()
corpus.append(input)
#Turn list into Text form for NLTK
corpus_text = nltk.Text(corpus)
#Remove stop-words
for w in corpus_text:
if w in stopwords:
corpus_text.remove(w)
fd = FreqDist(corpus_text)
dmh是完全正確的。在NLTK中不需要使用'text()'函數。你的'corpus []'數組,對於執行FreqDist應該沒問題。 – 2012-05-20 16:40:55