NP-chunker值錯誤（Python nltk）

我正在構建基於Python NLTK書（第7章）的NLP管道。碼的第一段正確預處理的數據，但我無法通過我的NP-組塊來運行它的輸出：NP-chunker值錯誤（Python nltk）

import nltk, re, pprint 

#Import Data 

data = 'This is a test sentence to check if preprocessing works' 

#Preprocessing 

def preprocess(document): 
    sentences = nltk.sent_tokenize(document) 
    sentences = [nltk.word_tokenize(sent) for sent in sentences] 
    sentences = [nltk.pos_tag(sent) for sent in sentences] 
    return(sentences) 

tagged = preprocess(data) 
print(tagged) 

#regular expression-based NP chunker 

grammar = "NP: {<DT>?<JJ>*<NN>}" 
cp = nltk.RegexpParser(grammar) #chunk parser 
chunked = [] 
for s in tagged: 
    chunked.append(cp.parse(tagged)) 
print(chunked)

這是回溯我得到：

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile 
    execfile(filename, namespace) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile 
    exec(compile(f.read(), filename, 'exec'), namespace) 
    File "C:/Users/u0084411/Box Sync/Procesmanager DH/Text Mining/Tools/NLP_pipeline.py", line 24, in <module> 
    chunked.append(cp.parse(tagged)) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 1202, in parse 
    chunk_struct = parser.parse(chunk_struct, trace=trace) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 1017, in parse 
    chunkstr = ChunkString(chunk_struct) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 95, in __init__ 
    tags = [self._tag(tok) for tok in self._pieces] 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 95, in <listcomp> 
    tags = [self._tag(tok) for tok in self._pieces] 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 105, in _tag 
    raise ValueError('chunk structures must contain tagged ' 
ValueError: chunk structures must contain tagged tokens or trees 
>>>

什麼是我的錯在這裏？ '標記'被標記，所以程序爲什麼不能識別這個？

非常感謝！湯姆

來源

2017-03-10 Tom Willaert

請參閱[我爲什麼會出錯？ ValueError：塊結構必須包含標記的標記或樹]（http://stackoverflow.com/questions/13269543/why-am-i-getting-error-valueerror-chunk-structures-must-contain-tagged-tokens）。 –

我已經實現了這一點，但我得到相同的回溯 –

你的標籤必須是一個元組或樹。請參閱http://www.nltk.org/_modules/nltk/chunk/regexp.html。 –

當你看到這個，你會打你的額頭。取而代之的是

for s in tagged: 
    chunked.append(cp.parse(tagged))

它應該是這樣的：

for s in tagged: 
    chunked.append(cp.parse(s))

你都拿到了錯誤，因爲你不及格cp.parse()一個標記句子，但它們的列表。

來源

2017-03-10 19:51:11 alexis

好的，非常感謝，現在我得到了一些結果，但我不確定如何解釋它：[Tree '''，'（'TH''，'NNP'），樹（'NP'，[（'sentence'，'NN'）]），（'contains'，'VBZ'），（'one'，' CD'），樹（'NP'，[（'noun'，'NN'）]），樹（'NP'，[（'phrase'，'NN'）]）]）] –

我無法致電chunked.draw得到一個視覺表示; traceback給我：AttributeError：'list'對象沒有屬性'draw' –

你正在輸入'chunked.draw（）'？它是一個列表（列表_you_定義），樹是試試'chunked [0] .draw（）'。 – alexis

NP-chunker值錯誤（Python nltk）

回答

相關問題