我想嘗試，並在nltk包的Python 3.5使用PerceptronTagger，但我得到的錯誤TypeError: 'LazySubsequence' object does not support item assignmentNLTK感知惡搞「類型錯誤：‘LazySubsequence’對象不支持項目分配」

我想用帶有universal標記集的棕色語料庫中的數據對其進行訓練。

這是我遇到問題時運行的代碼。

import nltk,math 
tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal') 
i = math.floor(len(tagged_sentences)*0.2) 
testing_sentences = tagged_sentences[0:i] 
training_sentences = tagged_sentences[i:] 
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False) 
perceptron_tagger.train(training_sentences)

它不會正確訓練，並給出以下堆棧跟蹤。

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-10-61332d63d2c3> in <module>() 
     1 perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False) 
----> 2 perceptron_tagger.train(training_sentences) 

/home/nathan/anaconda3/lib/python3.5/site-packages/nltk/tag/perceptron.py in train(self, sentences, save_loc, nr_iter) 
    192      c += guess == tags[i] 
    193      n += 1 
--> 194    random.shuffle(sentences) 
    195    logging.info("Iter {0}: {1}/{2}={3}".format(iter_, c, n, _pc(c, n))) 
    196   self.model.average_weights() 

/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random) 
    270     # pick an element in x[:i+1] with which to exchange x[i] 
    271     j = randbelow(i+1) 
--> 272     x[i], x[j] = x[j], x[i] 
    273   else: 
    274    _int = int 

TypeError: 'LazySubsequence' object does not support item assignment

這似乎是從random模塊中的shuffle功能來，但並沒有真正似乎是正確。

是否還有其他可能導致問題的東西？有人有這個問題嗎？

我在Ubuntu 16.04.1上運行了Anaconda Python 3.5。 nltk版本是3.2.1

來源

2016-09-21 Nathan McCoy

NLTK有很多自定義的「懶」類型，這應該緩解大型數據體（如註釋語料庫）的損壞。它們在許多方面表現得像標準列表，元組，字典等，但避免不必要地佔用太多內存。

這個的一個例子是LazySubsequence，這是片段表達式tagged_sentences[i:]的結果。如果tagged_sentences是正常列表，則將數據劃分爲測試/培訓將創建數據的完整副本。相反，這LazySubsequence是一個視圖到部分原始序列。

儘管這樣做的內存好處可能是一件好事，但問題在於此視圖是隻讀的。顯然PerceptronTagger想打亂它的輸入數據，這是不允許的 - 因此是例外。

快速（但也許不是最優雅）的解決方案是提供惡搞與數據的副本：

perceptron_tagger.train(tuple(training_sentences))

您可能必須做同樣的事情與測試數據。

來源

2016-09-22 12:09:27 lenz

看起來你在寫我的時候寫了一個答案。我得出了同樣的結論，所以我會將你的評價標記爲正確，因爲我很欣賞這一努力。 –

很好，你自己找到了解決方案！這些NLTK容器可能非常棘手，有時候... – lenz

調試

做一些grep荷蘭國際集團在nltk源代碼中找到了答案。

在文件site-packages/nltk/util.py中聲明瞭該類。

class LazySubsequence(AbstractLazySequence): 
    """                                         
    A subsequence produced by slicing a lazy sequence. This slice                          
    keeps a reference to its source sequence, and generates its values                         
    by looking them up in the source sequence.                               
    """

從解釋我看到的tagged_sentences

>>> import nltk 
>>> tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal') 
>>> type(tagged_sentences) 
<class 'nltk.corpus.reader.util.ConcatenatedCorpusView'>

的type()我的文件site-packages/nltk/corpus/reader/util.py

class ConcatenatedCorpusView(AbstractLazySequence): 
    """                                         
    A 'view' of a corpus file that joins together one or more                            
    ``StreamBackedCorpusViews<StreamBackedCorpusView>``. At most                           
    one file handle is left open at any time.                                
    """

最後的測試與中看到以下細節另一個快速測試後random包證明存在的問題存在於我創建tagged_sentences

>>> import random 
>>> random.shuffle(training_sentences) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-30-0b03f0366949> in <module>() 
     1 import random 
----> 2 random.shuffle(training_sentences) 
     3 
     4 
     5 

/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random) 
    270     # pick an element in x[:i+1] with which to exchange x[i] 
    271     j = randbelow(i+1) 
--> 272     x[i], x[j] = x[j], x[i] 
    273   else: 
    274    _int = int 

TypeError: 'LazySubsequence' object does not support item assignment

解決方案

要解決的錯誤，只是明確地創建從nltk.corpus.brown包句子的名單，然後random可以正常洗牌的數據。

import nltk,math 
# explicitly make list, then LazySequence will traverse all items 
tagged_sentences = [sentence for sentence in nltk.corpus.brown.tagged_sents(categories='news',tagset='universal')] 
i = math.floor(len(tagged_sentences)*0.2) 
testing_sentences = tagged_sentences[0:i] 
training_sentences = tagged_sentences[i:] 
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False) 
perceptron_tagger.train(training_sentences) 
# no error, yea!

現在標記工作正常。

>>> perceptron_tagger_preds = [] 
>>> for test_sentence in testing_sentences: 
... perceptron_tagger_preds.append(perceptron_tagger.tag([word for word,_ in test_sentence])) 
>>> print(perceptron_tagger_preds[676]) 
[('Formula', 'NOUN'), ('is', 'VERB'), ('due', 'ADJ'), ('this', 'DET'), ('week', 'NOUN')]

來源

2016-09-22 12:21:56

NLTK感知惡搞「類型錯誤：‘LazySubsequence’對象不支持項目分配」

回答

調試

解決方案

相關問題