如何將關鍵字放入NLTK標記大小？

我設置爲「使命召喚」作爲重點詞，這句話將成爲記號化過程中的一個字。

最後要得到的結果：「我的」，「收藏」，「遊戲」，「是」，「使命召喚」]

那麼，如何設置在python NLP關鍵字？

2017-04-24 blackbaka

爲什麼它會是一個令牌？你希望它被識別爲一個實體，而不是一個令牌。 – erip

我認爲你想要的是關鍵字提取，你可以做到這一點，例如首先用每個單詞的PoS標籤標記每個單詞，然後在PoS標籤上應用某種正則表達式將有趣的單詞加入到關鍵字句中。

import nltk 
from nltk import pos_tag 
from nltk import tokenize 

def extract_phrases(my_tree, phrase): 
    my_phrases = [] 
    if my_tree.label() == phrase: 
     my_phrases.append(my_tree.copy(True)) 

    for child in my_tree: 
     if type(child) is nltk.Tree: 
      list_of_phrases = extract_phrases(child, phrase) 
      if len(list_of_phrases) > 0: 
       my_phrases.extend(list_of_phrases) 

    return my_phrases 


def main(): 
    sentences = ["My favorite game is call of duty"] 

    grammar = "NP: {<DT>?<JJ>*<NN>|<NNP>*}" 
    cp = nltk.RegexpParser(grammar) 

    for x in sentences: 
     sentence = pos_tag(tokenize.word_tokenize(x)) 
     tree = cp.parse(sentence) 
     print "\nNoun phrases:" 
     list_of_noun_phrases = extract_phrases(tree, 'NP') 
     for phrase in list_of_noun_phrases: 
      print phrase, "_".join([x[0] for x in phrase.leaves()]) 

if __name__ == "__main__": 
    main()

這將輸出如下：

Noun phrases: 
(NP favorite/JJ game/NN) favorite_game 
(NP call/NN) call 
(NP duty/NN) duty

但是，你可以玩弄

grammar = "NP: {<DT>?<JJ>*<NN>|<NNP>*}"

嘗試其他類型的表達式，這樣就可以得到你想要的東西，取決於你想加入的單詞/標籤。

此外，如果你有興趣，看看這個非常好的介紹的關鍵詞/詞語的提取：

https://bdewilde.github.io/blog/2014/09/23/intro-to-automatic-keyphrase-extraction/

來源

2017-04-25 12:27:39

這，當然，太晚要到OP是有用的，但我想我倒是在這裏把這個答案的人：

這聽起來像你也許真的問的是：如何確保複合短語，如‘使命召喚’得到組合在一起作爲一個令牌？

您可以使用NLTK的多字表達標記生成器，像這樣：

string = 'My favorite game is call of duty' 
tokenized_string = nltk.word_tokenize(string) 

mwe = [('call', 'of', 'duty')] 
mwe_tokenizer = nltk.tokenize.MWETokenizer(mwe) 
tokenized_string = mwe_tokenizer.tokenize(tokenized_string)

凡mwe代表多字表達。的tokenized_string值將是['My', 'favorite', 'game', 'is', 'call of duty']

來源

2017-10-07 03:27:50

如何將關鍵字放入NLTK標記大小？

回答

相關問題