3
在下面的代碼中,我知道我的naivebayes分類器工作正常,因爲它在trainset1上正常工作,但它爲什麼不在trainset2上工作?我甚至嘗試過兩個分類器,一個來自TextBlob,另一個來自nltk。nltk naivebayes分類器的文本分類
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob
from nltk.tokenize import word_tokenize
import nltk
trainset1 = [('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg')]
trainset2 = [('hide all brazil and everything plan limps to anniversary inflation plan initiallyis limping its first anniversary amid soaring prices', 'class1'),
('hello i was there and no one came', 'class2'),
('all negative terms like sad angry etc', 'class2')]
def nltk_naivebayes(trainset, test_sentence):
all_words = set(word.lower() for passage in trainset for word in word_tokenize(passage[0]))
t = [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in trainset]
classifier = nltk.NaiveBayesClassifier.train(t)
test_sent_features = {word.lower(): (word in word_tokenize(test_sentence.lower())) for word in all_words}
return classifier.classify(test_sent_features)
def textblob_naivebayes(trainset, test_sentence):
cl = NaiveBayesClassifier(trainset)
blob = TextBlob(test_sentence,classifier=cl)
return blob.classify()
test_sentence1 = "he is my horrible enemy"
test_sentence2 = "inflation soaring limps to anniversary"
print nltk_naivebayes(trainset1, test_sentence1)
print nltk_naivebayes(trainset2, test_sentence2)
print textblob_naivebayes(trainset1, test_sentence1)
print textblob_naivebayes(trainset2, test_sentence2)
輸出:
neg
class2
neg
class2
雖然test_sentence2明顯屬於1類。