2013-04-24 58 views
0

我做在Python使用禁用詞的Python KeyError異常:「」自動語言檢測

自動語言檢測,但試圖測試代碼時我得到KeyError異常。 這是代碼

import nltk 
from nltk.corpus import stopwords 

def scoreFunction(wholetext): 
    dictiolist={} 
    scorelist={} 
    NLTKlanguage = ["dutch","finnish","german","italian","portuguese","spanish","turkish","danish","english"," french","hungarian","norwegian","russian","swedish"] 
    FREElanguages = [""] 
    languages= NLTKlanguages + FREElanguages 
    for lang in NLTKlanguages: 
     dictiolist[lang]=stopwords.words(lang) 
     tokens=nltk.tokenize.word_tokenize(wholetext) 
     tokens=[t.lower() for t in tokens] 
     freq_dist=nltk.FreqDist(tokens) 
    for lang in languages: 
     scorelist[lang]=0 
    for word in freq_dist.keys()[0:20]: 
     if word in dictiolist[lang]: 
      scorelist[lang]+=1 
    return scorelist 

def whichLanguage(scorelist): 
    maximum=0 
    for item in scorelist: 
     value = scorelist[item] 
     if maximum<value: 
      maximum = value 
      lang = item 
    return lang 

whene我運行它scoreFunction(「鑫隆我的名字是osfar我就是天才」) 我的錯誤 回溯(最近通話最後一個):文件「」,行1,在

scoreFunction("hello my name is osfar and i'm very genius") 
File "C:/Users/osama1/Desktop 
/fun-test", line 17, in scoreFunction 
if word in dictiolist[lang]: 
KeyError: '' 
+1

將所有相關信息添加到您的實際文章中,而不是在評論中。 – 2013-04-24 08:40:23

回答

1

你的問題出在下面的代碼塊:

for word in freq_dist.keys()[0:20]: 
    if word in dictiolist[lang]: 
    scorelist[lang]+=1 

您使用的是可變lang在這個for循環,但你沒有在任何地方定義它。這意味着它的價值是不確定的;碰巧它的值是「」(空字符串),因爲這是它在上一個for循環中的最後一個值。

你顯然意味着什麼做的是:

for word in freq_dist.keys()[0:20]: 
    for lang in languages: 
     if word in dictiolist[lang]: 
     scorelist[lang]+=1 

順便說一句,有做你想做什麼更簡單的方法:用一個計數器。有關更多信息,請參閱http://docs.python.org/2.7/library/collections.html#counter-objects