TfIdf矩陣爲BernoulliNB返回錯誤的特徵數量

使用python lib sklearn，我嘗試從訓練集中提取特徵並用這些數據擬合BernoulliNB分類器。TfIdf矩陣爲BernoulliNB返回錯誤的特徵數量

分類器未經訓練後，我想要預測（分類）一些新的測試數據。不幸的是我得到這個錯誤：

Traceback (most recent call last): 
File "sentiment_analysis.py", line 45, in <module> main() 
File "sentiment_analysis.py", line 41, in main 
    prediction = classifier.predict(tfidf_data) 
File "\Python27\lib\site-packages\sklearn\naive_bayes.py", line 64, in predict 
    jll = self._joint_log_likelihood(X) 
File "\Python27\lib\site-packages\sklearn\naive_bayes.py", line 724, in _joint_log_likelihood 
    % (n_features, n_features_X)) 
ValueError: Expected input with 4773 features, got 13006 instead

這是我的代碼：

#Train the Classifier 
data,target = load_file('validation/validation_set_5.csv') 
tf_idf = preprocess(data) 
classifier = BernoulliNB().fit(tf_idf, target) 

#Predict test data 
count_vectorizer = CountVectorizer(binary='true') 
test = count_vectorizer.fit_transform(test) 
tfidf_data = TfidfTransformer(use_idf=False).fit_transform(test) 
prediction = classifier.predict(tfidf_data)

來源

2015-10-05 fsteinbauer

這就是爲什麼你有這樣的錯誤：

test = count_vectorizer.fit_transform(test) 
tfidf_data = TfidfTransformer(use_idf=False).fit_transform(test)

你應該在這裏只使用舊變壓器（CountVectorizer和TfidfTransformer是你的變形金剛）裝在火車上。

fit_transform

意味着你適合在新集這些變壓器，失去約老適合所有信息，然後轉換「測試」這個變壓器（新樣本教訓，並與不同的功能集）。因此它將測試集轉換爲新的一組特徵，與訓練集中使用的舊特徵不兼容。爲了解決這個問題，你應該在舊的變形金剛上使用transform（not fit_transform）方法，它適合於訓練集。

你應該寫類似：

test = old_count_vectorizer.transform(test) 
tfidf_data = old_tfidf_transformer.transform(test)

來源

2015-10-05 10:42:23

TfIdf矩陣爲BernoulliNB返回錯誤的特徵數量

回答

相關問題