ne_chunk NLTK中沒有pos_tag

我試圖在nltk中使用ne_chunk和pos_tag對一個句子進行分塊。ne_chunk NLTK中沒有pos_tag

from nltk import tag 
from nltk.tag import pos_tag 
from nltk.tree import Tree 
from nltk.chunk import ne_chunk 

sentence = "Michael and John is reading a booklet in a library of Jakarta" 
tagged_sent = pos_tag(sentence.split()) 

print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)] 

print print_chunk

，這是結果：

[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]

我的問題，是有可能不包括pos_tag（如上面NNP），只包括樹 'GPE'， '人'？和'GPE'是什麼意思？

在此先感謝

來源

2017-05-29 sang

命名實體組塊會給你同時包含塊和標記的樹。你不能改變它，但你可以把標籤拿出來。從tagged_sent開始：

chunks = nltk.ne_chunk(tagged_sent) 
simple = [] 
for elt in chunks: 
    if isinstance(elt, Tree): 
     simple.append(Tree(elt.label(), [ word for word, tag in elt ])) 
    else: 
     simple.append(elt[0])

如果你只想要塊，省略else:條款在上面。您可以根據需要調整代碼以包裝塊。我使用了nltk Tree以保持最低限度的更改。請注意，某些塊由多個單詞組成（嘗試在您的示例中添加「New York」），因此塊的內容必須是列表，而不是單個元素。

PS。「GPE」代表「地緣政治實體」（顯然是一個chunker錯誤）。您可以在nltk書籍here中看到「常用標籤」列表。

來源

2017-05-29 09:33:35 alexis

由於它的作品！但我該如何訓練一些特殊的NE？像邁克爾必須是'PERSON'而不是'GPE'，因爲它是一個人名。 – sang

閱讀nltk書。然後在這裏問一個新問題，如果你還在想。簡而言之，您可以添加一個人名字典來覆蓋統計信息，但總的來說，您可以做的事情並不多。你試圖手動修復太多，你比你修復的更多。（例如，「伊麗莎白」是新澤西州的人還是城市？） – alexis

最有可能對https://stackoverflow.com/a/31838373/610569上的代碼進行輕微修改，並且標籤是您所需要的。

是否有可能不包含pos_tag（如上面的NNP）並且只包含Tree'GPE'，'PERSON'？

是，簡單地遍歷樹對象=）查看How to Traverse an NLTK Tree object?

>>> from nltk import Tree, pos_tag, ne_chunk 
>>> sentence = "Michael and John is reading a booklet in a library of Jakarta" 
>>> tagged_sent = ne_chunk(pos_tag(sentence.split())) 
>>> tagged_sent 
Tree('S', [Tree('GPE', [('Michael', 'NNP')]), ('and', 'CC'), Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('reading', 'VBG'), ('a', 'DT'), ('booklet', 'NN'), ('in', 'IN'), ('a', 'DT'), ('library', 'NN'), ('of', 'IN'), Tree('GPE', [('Jakarta', 'NNP')])]) 

>>> from nltk.sem.relextract import NE_CLASSES 
>>> ace_tags = NE_CLASSES['ace'] 

>>> for node in tagged_sent: 
...  if type(node) == Tree and node.label() in ace_tags: 
...   words, tags = zip(*node.leaves()) 
...   print node.label() + '\t' + ' '.join(words) 
... 
GPE Michael 
PERSON John 
GPE Jakarta

什麼 'GPE' 的意思？

GPE的意思是「地緣政治實體」

的GPE標籤從ACE dataset
來到有兩種可用的預先訓練NE chunkers，看到https://github.com/nltk/nltk/blob/develop/nltk/chunk/init.py#L164
有3種支持的標籤集：https://github.com/nltk/nltk/blob/develop/nltk/sem/relextract.py#L31
有關詳細說明，請參閱NLTK relation extraction returns nothing

來源

2017-05-29 10:37:50 alvas

ne_chunk NLTK中沒有pos_tag

回答

相關問題