2014-07-14 28 views
0

我有很多電子郵件主題和性能評分,我想用它們來預測哪些主題行將表現良好。當我運行我的MultinomialNB時,出現「對象未對齊」錯誤。這是代碼。Python/pandas中的MultinomialNB在預測時返回「對象不對齊」錯誤

import pandas as pd 
import numpy as np 

from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.naive_bayes import MultinomialNB 

input=pd.read_csv('subject_tool_input_500.csv') 
input.subject[input.subject.isnull()]=' ' 
good=np.asarray(input.unique_open_performance>0) 
subjects=input.subject 

classifier = MultinomialNB() 
count_vectorizer = CountVectorizer(strip_accents='unicode') 
counts=count_vectorizer.fit_transform(subjects) 

classifier.fit(counts,good) 
classifier.predict('test subject line') 

這將返回以下錯誤。

>>> classifier.predict('test subject line') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/Library/Python/2.7/site-packages/sklearn/naive_bayes.py", line 63, in predict 
    jll = self._joint_log_likelihood(X) 
    File "/Library/Python/2.7/site-packages/sklearn/naive_bayes.py", line 457, in _joint_log_likelihood 
    return (safe_sparse_dot(X, self.feature_log_prob_.T) 
    File "/Library/Python/2.7/site-packages/sklearn/utils/extmath.py", line 83, in safe_sparse_dot 
    return np.dot(a, b) 
ValueError: objects are not aligned 

這是我正在使用的輸入。

>>> subjects 
0       Thanksgiving Dinner Delivered 
1   It's Not Too Late To Order for Thanksgiving 
2    Stress Free Christmas Gift They'll Love 
3  Save $10 On Christmas Gift Certificates - Inst... 
4     Need a Last Minute Christmas Gift? 
5       Give Mom Something Special! 
6    Yummy Steaks For Dad - $15 Off Your Order 
7  Order a romantic dinner today and get it by Va... 
8  Taiyo Yuden Unveils Latest in SAW Filter and D... 
9  Taiyo Yuden New Noise Reducing Ferrite Bead Ch... 
10 Lithium Ion Capacitors Are Ultimate Replacemen... 
11         Art Wolfe Newsletter 
12       Art Wolfe Seminar Tour 2014 
13      Art Wolfe Spring 2014 Newsletter 
14     Day of the Dead Sale at Art Wolfe 
... 
8797625         Подписка на рассылку 
8797626         Подписка на рассылку 
8797627        Ramadan Mubarak from MFP 
8797628     Ramadan Mubarak from Insaan Relief 
8797629    UK Muslims! You have one new message... 
8797630 Open House - 1249 Los Robles Place, Pomona CA ... 
8797631 Open House - Custom Built Home by Conrad Buff ... 
8797632 Open House - Custom built by Buff, Smith & Hen... 
8797633 Open House - Custom Built Home by Conrad Buff ... 
8797634 Open House - Custom Built Home by Conrad Buff ... 
8797635 Open House - Custom Built Home by Conrad Buff ... 
8797636 Open House - Buff, Smith & Hensman custom buil... 
8797637 RAMADAN PROGRAMS: Dars-e-Qur'an in Rawalpindi ... 
8797638    Dars-e-Qur'an by Shaykh Hammad Mahmood 
8797639    Dars-e-Qur'an by Shaykh Hammad Mahmood 
Name: subject, Length: 8797640, dtype: object 
>>> counts 
<8797640x1172387 sparse matrix of type '<type 'numpy.int64'>' 
    with 62516240 stored elements in Compressed Sparse Column format> 
>>> good 
array([ True, False, True, ..., False, True, True], dtype=bool) 

我不知道爲什麼會發生這種情況。上週沒有熊貓,我能夠完成這個任務,但是我一直在嘗試使用數據框來促進我將要做的一些後續工作。

+0

它的工作,如果你改變這一行:'科目= input.subject'爲'科目= input.subject.values' ? – EdChum

+0

不幸的是,subject = input.subjects.values沒有幫助。 – neelshiv

回答

1

我是個白癡。我需要獲得我試圖預測的主題行數,所以最終應該更像這樣。

subcount=count_vectorizer.transform(["this is a test subject"]) 
classifier.predict(subcount) 

希望未來的人們可以看到這一點,而不會犯同樣的錯誤。

0

您將需要添加的TF-IDF矩陣不僅計數

subcount=count_vectorizer.transform(["this is a test subject"]) 
tfidf = tfidf_transformer.transform(subcount) 
classifier.predict(tfidf)