特定詞的NLTK搭配

我知道如何使用NLTK獲取bigram和trigram搭配，並將它們應用於我自己的語料庫。代碼如下。特定詞的NLTK搭配

但我不確定（1）如何獲得特定單詞的搭配？（2）NLTK是否具有基於對數似然比的搭配度量？

import nltk 
from nltk.collocations import * 
from nltk.tokenize import word_tokenize 

text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence" 

trigram_measures = nltk.collocations.TrigramAssocMeasures() 
finder = TrigramCollocationFinder.from_words(word_tokenize(text)) 

for i in finder.score_ngrams(trigram_measures.pmi): 
    print i

來源

2014-01-16 Sabba

試試這個代碼：

import nltk 
from nltk.collocations import * 
bigram_measures = nltk.collocations.BigramAssocMeasures() 
trigram_measures = nltk.collocations.TrigramAssocMeasures() 

# Ngrams with 'creature' as a member 
creature_filter = lambda *w: 'creature' not in w 


## Bigrams 
finder = BigramCollocationFinder.from_words(
    nltk.corpus.genesis.words('english-web.txt')) 
# only bigrams that appear 3+ times 
finder.apply_freq_filter(3) 
# only bigrams that contain 'creature' 
finder.apply_ngram_filter(creature_filter) 
# return the 10 n-grams with the highest PMI 
print finder.nbest(bigram_measures.likelihood_ratio, 10) 


## Trigrams 
finder = TrigramCollocationFinder.from_words(
    nltk.corpus.genesis.words('english-web.txt')) 
# only trigrams that appear 3+ times 
finder.apply_freq_filter(3) 
# only trigrams that contain 'creature' 
finder.apply_ngram_filter(creature_filter) 
# return the 10 n-grams with the highest PMI 
print finder.nbest(trigram_measures.likelihood_ratio, 10)

它使用的可能性的措施，並篩選出不包含這個詞「生物」

的n-gram

來源

2014-01-17 11:54:31 bogs

至於問題2，是的！ NLTK在其關聯度量中具有似然比。第一個問題仍然沒有答案！

http://nltk.org/api/nltk.metrics.html?highlight=likelihood_ratio#nltk.metrics.association.NgramAssocMeasures.likelihood_ratio

來源

2014-01-17 03:57:58 Sabba

問題1 - 嘗試：

target_word = "electronic" # your choice of word 
finder.apply_ngram_filter(lambda w1, w2, w3: target_word not in (w1, w2, w3)) 
for i in finder.score_ngrams(trigram_measures.likelihood_ratio): 
print i

的想法是過濾掉你不想要的。這種方法通常用於過濾ngram中特定部分的單詞，並且可以根據您的內容調整它。

來源

2014-01-17 04:22:01 dmvianna

特定詞的NLTK搭配

回答

相關問題