2017-03-10 31 views
0

我正在構建基於Python NLTK書(第7章)的NLP管道。碼的第一段正確預處理的數據,但我無法通過我的NP-組塊來運行它的輸出:NP-chunker值錯誤(Python nltk)

import nltk, re, pprint 

#Import Data 

data = 'This is a test sentence to check if preprocessing works' 

#Preprocessing 

def preprocess(document): 
    sentences = nltk.sent_tokenize(document) 
    sentences = [nltk.word_tokenize(sent) for sent in sentences] 
    sentences = [nltk.pos_tag(sent) for sent in sentences] 
    return(sentences) 

tagged = preprocess(data) 
print(tagged) 

#regular expression-based NP chunker 

grammar = "NP: {<DT>?<JJ>*<NN>}" 
cp = nltk.RegexpParser(grammar) #chunk parser 
chunked = [] 
for s in tagged: 
    chunked.append(cp.parse(tagged)) 
print(chunked) 

這是回溯我得到:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile 
    execfile(filename, namespace) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile 
    exec(compile(f.read(), filename, 'exec'), namespace) 
    File "C:/Users/u0084411/Box Sync/Procesmanager DH/Text Mining/Tools/NLP_pipeline.py", line 24, in <module> 
    chunked.append(cp.parse(tagged)) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 1202, in parse 
    chunk_struct = parser.parse(chunk_struct, trace=trace) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 1017, in parse 
    chunkstr = ChunkString(chunk_struct) 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 95, in __init__ 
    tags = [self._tag(tok) for tok in self._pieces] 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 95, in <listcomp> 
    tags = [self._tag(tok) for tok in self._pieces] 
    File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 105, in _tag 
    raise ValueError('chunk structures must contain tagged ' 
ValueError: chunk structures must contain tagged tokens or trees 
>>> 

什麼是我的錯在這裏? '標記'被標記,所以程序爲什麼不能識別這個?

非常感謝! 湯姆

+0

請參閱[我爲什麼會出錯? ValueError:塊結構必須包含標記的標記或樹](http://stackoverflow.com/questions/13269543/why-am-i-getting-error-valueerror-chunk-structures-must-contain-tagged-tokens)。 –

+0

我已經實現了這一點,但我得到相同的回溯 –

+0

你的標籤必須是一個元組或樹。請參閱http://www.nltk.org/_modules/nltk/chunk/regexp.html。 –

回答

1

當你看到這個,你會打你的額頭。取而代之的是

for s in tagged: 
    chunked.append(cp.parse(tagged)) 

它應該是這樣的:

for s in tagged: 
    chunked.append(cp.parse(s)) 

你都拿到了錯誤,因爲你不及格cp.parse()一個標記句子,但它們的列表。

+0

好的,非常感謝,現在我得到了一些結果,但我不確定如何解釋它:[Tree ''','('TH'','NNP'),樹('NP',[('sentence','NN')]),('contains','VBZ'),('one',' CD'),樹('NP',[('noun','NN')]),樹('NP',[('phrase','NN')])])] –

+0

我無法致電chunked.draw得到一個視覺表示; traceback給我:AttributeError:'list'對象沒有屬性'draw' –

+0

你正在輸入'chunked.draw()'?它是一個列表(列表_you_定義),樹是試試'chunked [0] .draw()'。 – alexis