2014-05-07 64 views
-1
Using an NLTK Conditional Frequency Distribution and the nltk.bigrams function, train a bigram model on the Genesis: 

text = nltk.corpus.genesis.words('english-kjv.txt') 
bigrams = nltk.bigrams(text) 
cfd = nltk.ConditionalFreqDist(bigrams) 
Answer the following questions 

What is the Probability of ‘begining’ given ‘the’? 
What is the probability of ‘the’? 

注意:作爲答案給出的概率必須是從該語料庫可計算的概率。給定'the'的'開始'的概率是多少?

嗨,可以幫助我嗎?這是在nltk書。當我得到它時,我得到了78%,這是沒有意義的。我試圖在Python中計算。

+1

零,這不是如何「開始」拼寫:) – hobbs

+1

我的天才天才!那麼那麼呢?我仍然得到78 – user3563184

回答

0

有幾分probability of 'beginning' intersect 'the'

p('beginning','the') 

probability of 'beginning' given 'the'之間的差異:

p('beginning'|'the') = p('beginning','the')/p('the') 

嘗試:

from collections import Counter 

import nltk 

text = nltk.corpus.genesis.words('english-kjv.txt') 
bigrams = nltk.bigrams(text) 
cfd_bigrams = Counter(bigrams) 
cfd_unigrams = Counter(list(text)) 

print "p('said','unto') =", cfd_bigrams[u'said', u'unto']/float(sum(cfd_bigrams.values())) 

print "p('said'|'unto') =", (cfd_bigrams[u'said', u'unto']/float(sum(cfd_bigrams.values())))/cfd_unigrams[u'unto'] 

print "p('beginning','the') =", cfd_bigrams[u'beginning', u'the'] 

[出]:

p('said','unto') = 0.00397649844738 
p('said'|'unto') = 6.73982787691e-06 
p('beginning','the') = 0