從Scikit學習使用Python MultinomialNB()
,我想不僅在文檔Word功能,而且在情緒詞典(意思是隻字未列出的Python數據類型)的文件進行分類。添加功能,多項樸素貝葉斯分類器 - Python的
假設這些文件,以培養
train_data = ['i hate who you welcome for','i adore him with all my heart','i can not forget his warmest welcome for me','please forget all these things! this house smells really weird','his experience helps a lot to complete all these tedious things', 'just ok', 'nothing+special today']
train_labels = ['Nega','Posi','Posi','Nega','Posi','Other','Other']
psentidict = ['welcome','adore','helps','complete','fantastic']
nsentidict = ['hate','weird','tedious','forget','abhor']
osentidict = ['ok','nothing+special']
我可以通過所有令牌的計算根據相應的標籤訓練下方
from sklearn import naive_bayes
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', naive_bayes.MultinomialNB(alpha = 1.0)),])
text_clf = text_clf.fit(train_data, train_labels)
喜歡這些名單雖然我訓練中的數據,我想將我的情感字典用作額外的分類功能。
這是因爲通過詞典訓練的特徵,可以預測OOV(超出詞彙量)。只有笨拙的拉普拉斯平滑(alpha = 1.0)
,整體精度將受到嚴重限制。
test_data = 'it is fantastic'
predicted_labels = text_clf.predict(test_data)
隨着字典功能的增加,可以預測上面的句子,儘管每一個單詞都不在訓練文檔中。
如何將psentidict
,nsentidict
和osentidict
的特徵添加到Multinomial樸素貝葉斯分類器?