從NLTK

2017-02-08 25 views 1 likes

我使用下面的方法，但它總是拋出我的無效文件錯誤：從NLTK

import nltk

然後

file=open(nltk.corpus.gutenberg.words('austen-persuasion.txt'),"r").read().split().lower() 
wordcount={} 

for word in file: 
    if word not in wordcount: 
     wordcount[word] = 1 
    else: 
     wordcount[word] += 1 
print ("The frequency of each word in the text file is as follows :") 
for k,v in wordcount.items(): 
    print (k, v)

的錯誤如下

TypeError         Traceback (most recent call last) 
<ipython-input-88-de499228f7ab> in <module>() 
    1 import nltk 
----> 2 file=open(nltk.corpus.gutenberg.words('austen-persuasion.txt'),'r').read().split() 
    3 #file = nltk.corpus.gutenberg.words('austen-persuasion.txt') 
    4 wordcount={} 
    5 

TypeError: invalid file: ['[', 'Persuasion', 'by', 'Jane', 'Austen', '1818', ...]

來源

2017-02-08 Mitesh Puthran

你不需要拆分（）加速文件，ntkl功能爲你做 – patito

回答

正如評論中提到的@patito，您不需要使用read，也不需要使用split，正如nltk正在閱讀它的單詞列表。你可以看到，你自己：

>>> file = nltk.corpus.gutenberg.words('austen-persuasion.txt') 
>>> file[0:10] 
[u'[', u'Persuasion', u'by', u'Jane', u'Austen', u'1818', u']', u'Chapter', u'1', u'Sir']

你還需要修復縮進在你的字數，但否則它會爲你工作。

來源

2017-02-08 18:17:37 Tchotchke

縮進是完全正常的，但是我不能使用.lower（）文件將所有文本轉換爲小寫。 –

只需使用列表理解：'file = [word.lower（）for word in file]''。並且上面粘貼它的縮進不起作用 - 您需要在文件中輸入「in word」後縮進：' – Tchotchke

謝謝，它工作得很好。對不起，我在這裏粘貼了錯誤縮進的代碼。 –

相關問題

11. 無法從NLTK庫導入Bigrams
12. 遞歸提取同義詞從NLTK
13. 從nltk學習單詞對齊
14. NLTK從標記重建句子
15. 從txt讀取NLTK標記器
16. NLTK從CSV中刪除停用詞
17. 從NLTK進口WhitespaceTokenizer給人導入錯誤：沒有模塊名爲NLTK
18. NLTK pos_tag使用
19. NLTK包錯誤
20. Set_Weights NLTK Maxent？
21. Python - WordNet NLTK KeyError
22. Nltk安裝
23. Lemmainser使用NLTK
24. Nominalisation using nltk
25. NLTK NaiveBayesClassifier培訓
26. Subtree Extraction NLTK樹
27. NLTK性能
28. nltk NgramModel錯誤
29. NLTK單詞lemmatizing
30. FreqDist使用NLTK