2016-09-21 60 views
3

我想嘗試,並在nltk包的Python 3.5使用PerceptronTagger,但我得到的錯誤TypeError: 'LazySubsequence' object does not support item assignmentNLTK感知惡搞「類型錯誤:‘LazySubsequence’對象不支持項目分配」

我想用帶有universal標記集的棕色語料庫中的數據對其進行訓練。

這是我遇到問題時運行的代碼。

import nltk,math 
tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal') 
i = math.floor(len(tagged_sentences)*0.2) 
testing_sentences = tagged_sentences[0:i] 
training_sentences = tagged_sentences[i:] 
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False) 
perceptron_tagger.train(training_sentences) 

它不會正確訓練,並給出以下堆棧跟蹤。

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-10-61332d63d2c3> in <module>() 
     1 perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False) 
----> 2 perceptron_tagger.train(training_sentences) 

/home/nathan/anaconda3/lib/python3.5/site-packages/nltk/tag/perceptron.py in train(self, sentences, save_loc, nr_iter) 
    192      c += guess == tags[i] 
    193      n += 1 
--> 194    random.shuffle(sentences) 
    195    logging.info("Iter {0}: {1}/{2}={3}".format(iter_, c, n, _pc(c, n))) 
    196   self.model.average_weights() 

/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random) 
    270     # pick an element in x[:i+1] with which to exchange x[i] 
    271     j = randbelow(i+1) 
--> 272     x[i], x[j] = x[j], x[i] 
    273   else: 
    274    _int = int 

TypeError: 'LazySubsequence' object does not support item assignment 

這似乎是從random模塊中的shuffle功能來,但並沒有真正似乎是正確。

是否還有其他可能導致問題的東西? 有人有這個問題嗎?

我在Ubuntu 16.04.1上運行了Anaconda Python 3.5。 nltk版本是3.2.1

回答

2

NLTK有很多自定義的「懶」類型,這應該緩解大型數據體(如註釋語料庫)的損壞。它們在許多方面表現得像標準列表,元組,字典等,但避免不必要地佔用太多內存。

這個的一個例子是LazySubsequence,這是片段表達式tagged_sentences[i:]的結果。如果tagged_sentences是正常列表,則將數據劃分爲測試/培訓將創建數據的完整副本。相反,這LazySubsequence是一個視圖到部分原始序列。

儘管這樣做的內存好處可能是一件好事,但問題在於此視圖是隻讀的。 顯然PerceptronTagger想打亂它的輸入數據,這是不允許的 - 因此是例外。

快速(但也許不是最優雅)的解決方案是提供惡搞與數​​據的副本:

perceptron_tagger.train(tuple(training_sentences)) 

您可能必須做同樣的事情與測試數據。

+0

看起來你在寫我的時候寫了一個答案。我得出了同樣的結論,所以我會將你的評價標記爲正確,因爲我很欣賞這一努力。 –

+1

很好,你自己找到了解決方案!這些NLTK容器可能非常棘手,有時候... – lenz

5

調試

做一些grep荷蘭國際集團在nltk源代碼中找到了答案。

在文件site-packages/nltk/util.py中聲明瞭該類。

class LazySubsequence(AbstractLazySequence): 
    """                                         
    A subsequence produced by slicing a lazy sequence. This slice                          
    keeps a reference to its source sequence, and generates its values                         
    by looking them up in the source sequence.                               
    """ 

從解釋我看到的tagged_sentences

>>> import nltk 
>>> tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal') 
>>> type(tagged_sentences) 
<class 'nltk.corpus.reader.util.ConcatenatedCorpusView'> 

type()我的文件site-packages/nltk/corpus/reader/util.py

class ConcatenatedCorpusView(AbstractLazySequence): 
    """                                         
    A 'view' of a corpus file that joins together one or more                            
    ``StreamBackedCorpusViews<StreamBackedCorpusView>``. At most                           
    one file handle is left open at any time.                                
    """ 

最後的測試與中看到以下細節另一個快速測試後random包證明存在的問題存在於我創建tagged_sentences

>>> import random 
>>> random.shuffle(training_sentences) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-30-0b03f0366949> in <module>() 
     1 import random 
----> 2 random.shuffle(training_sentences) 
     3 
     4 
     5 

/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random) 
    270     # pick an element in x[:i+1] with which to exchange x[i] 
    271     j = randbelow(i+1) 
--> 272     x[i], x[j] = x[j], x[i] 
    273   else: 
    274    _int = int 

TypeError: 'LazySubsequence' object does not support item assignment 

解決方案

要解決的錯誤,只是明確地創建從nltk.corpus.brown包句子的名單,然後random可以正常洗牌的數據。

import nltk,math 
# explicitly make list, then LazySequence will traverse all items 
tagged_sentences = [sentence for sentence in nltk.corpus.brown.tagged_sents(categories='news',tagset='universal')] 
i = math.floor(len(tagged_sentences)*0.2) 
testing_sentences = tagged_sentences[0:i] 
training_sentences = tagged_sentences[i:] 
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False) 
perceptron_tagger.train(training_sentences) 
# no error, yea! 

現在標記工作正常。

>>> perceptron_tagger_preds = [] 
>>> for test_sentence in testing_sentences: 
... perceptron_tagger_preds.append(perceptron_tagger.tag([word for word,_ in test_sentence])) 
>>> print(perceptron_tagger_preds[676]) 
[('Formula', 'NOUN'), ('is', 'VERB'), ('due', 'ADJ'), ('this', 'DET'), ('week', 'NOUN')] 
相關問題