Python Maxent分類器

我一直在Python中使用maxent分類器及其失敗，我不明白爲什麼。Python Maxent分類器

我正在使用電影評論語料庫。（總小白）

import nltk.classify.util 
from nltk.classify import MaxentClassifier 
from nltk.corpus import movie_reviews 

def word_feats(words): 
return dict([(word, True) for word in words]) 

negids = movie_reviews.fileids('neg') 
posids = movie_reviews.fileids('pos') 

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids] 
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids] 

negcutoff = len(negfeats)*3/4 
poscutoff = len(posfeats)*3/4 

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff] 
classifier = MaxentClassifier.train(trainfeats)

這是錯誤（我知道我這樣做不對，請鏈接到如何Maxent模型作品）

警告（從警告模塊）：文件「C： \ Python27 \ lib中\站點包\ NLTK \分類\ maxent.py」，線1334 SUM1 = numpy.sum（exp_nf_delta * A，軸= 0） RuntimeWarning：在乘法

警告遇到無效值（從警告模塊）：文件「C：\ Python27 \ lib \ site-packages \ nltk \ classify \ maxent.py」，第1335行 sum2 = numpy.sum（nf_exp_nf_delta * A，axis = 0） RuntimeWarning：乘法中遇到的無效值

（從警告模塊）警告：文件「C：\ Python27 \ lib中\站點包\ NLTK \分類\ maxent.py」，線1341 增量 - =（ffreq_empirical - SUM1）/ -sum2 RuntimeWarning：無效在劃分遇到值

來源

2013-04-13 cjds

莫非[此]（http://stackoverflow.com/questions/9140744/numpy-error-invalid-value-encountered-in-power）是同樣的問題，也許？ –

順便說一句，NLTK使用scipy的maxent分類器。 Maxent分類器已經在scipy中從0.11開始刪除（http://docs.scipy.org/doc/scipy-0.10.1/reference/maxentropy.html）。 Maxent分類可能無法在NLTK中使用。 –

有可能爲numpy溢出問題的解決辦法，但由於這只是學習NLTK /文本分類的電影評論的分類（一d您可能不希望訓練花費很長時間），我會提供一個簡單的解決方法：您可以僅限制功能集中使用的單詞。

你可以找到像這樣所有的評論300最常用的詞（可以很明顯的做出更高，如果你想），

all_words = nltk.FreqDist(word for word in movie_reviews.words()) 
top_words = set(all_words.keys()[:300])

然後，所有你需要做的就是交叉引用top_words在功能提取器的評論。此外，就像一個建議一樣，使用字典理解更有效，而不是將list的tuple s轉換爲dict。所以這可能看起來像，

def word_feats(words): 
    return {word:True for word in words if word in top_words}

來源

2013-04-13 22:59:43 Jared

我改變並更新了一下代碼。

import nltk, nltk.classify.util, nltk.metrics 
from nltk.classify import MaxentClassifier 
from nltk.collocations import BigramCollocationFinder 
from nltk.metrics import BigramAssocMeasures 
from nltk.probability import FreqDist, ConditionalFreqDist 
from sklearn import cross_validation 


from nltk.classify import MaxentClassifier 
from nltk.corpus import movie_reviews 

def word_feats(words): 
return dict([(word, True) for word in words]) 

negids = movie_reviews.fileids('neg') 
posids = movie_reviews.fileids('pos') 

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids] 
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids] 

negcutoff = len(negfeats)*3/4 
poscutoff = len(posfeats)*3/4 

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff] 
#classifier = nltk.MaxentClassifier.train(trainfeats) 

algorithm = nltk.classify.MaxentClassifier.ALGORITHMS[0] 
classifier = nltk.MaxentClassifier.train(trainfeats, algorithm,max_iter=3) 

classifier.show_most_informative_features(10) 

all_words = nltk.FreqDist(word for word in movie_reviews.words()) 
top_words = set(all_words.keys()[:300]) 

def word_feats(words): 
    return {word:True for word in words if word in top_words}

來源

2014-02-21 14:06:47 J4cK

Python Maxent分類器

回答

相關問題