2017-05-14 27 views
0

IndexError: list index out of rangetag_sents() NLTK SennaTagger方法(http://www.nltk.org/_modules/nltk/tag/senna.html)被調用。當調用NLTK SennaTagger的tag_sents()方法時,列表索引超出範圍錯誤

給出了句子列表作爲tag_sents方法的輸入。

需要阿森納的可執行文件運行惡搞。 SENNA工具包的安裝指南可以在這裏找到。 http://ronan.collobert.com/senna/

代碼:

from nltk.tag import SennaTagger 

SENNA_EXECUTABLE_DIR = '../../tools/senna' 

pos_tagger = SennaTagger(SENNA_EXECUTABLE_DIR) 

tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"]) 

輸出:

Traceback (most recent call last): 

    File "<ipython-input-90-886051c3d91d>", line 1, in <module> 
    tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"]) 

    File "F:\Programs\Anaconda3\lib\site-packages\nltk\tag\senna.py", line 55, in tag_sents 
    tagged_sents = super(SennaTagger, self).tag_sents(sentences) 

    File "F:\Programs\Anaconda3\lib\site-packages\nltk\classify\senna.py", line 161, in tag_sents 
    result[tag] = tags[map_[tag]].strip() 

IndexError: list index out of rangeenter code here 

回答

1

爲senna.tag_sents的輸入字符串列表的列表,這可以通過[word_tokenize(sent) for sent in sents]

>>> from nltk import word_tokenize 
>>> from nltk.tag import SennaTagger 
>>> senna = SennaTagger('/home/alvas/senna/') 
>>> sents = ["All the banks are closed", "Today is Sunday"] 

>>> tokenized_sents = [word_tokenize(sent) for sent in sents] 
>>> senna.tag_sents(tokenized_sents) 
[[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]] 

實現或者如果您不想實現,請使用maptokenized_sents加標籤前:

>>> tokenized_sents = map(word_tokenize, sents) 
>>> senna.tag_sents(tokenized_sents) 
[[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]