NotFittedError：TfidfVectorizer - 詞彙未安裝

我正在嘗試使用scikit-learn/pandas構建一個情感分析器。建立和評估模型的工作，但試圖分類新的示例文本不。NotFittedError：TfidfVectorizer - 詞彙未安裝

我的代碼：

import csv 
import pandas as pd 
import numpy as np 
from sklearn.model_selection import train_test_split 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.naive_bayes import BernoulliNB 
from sklearn.metrics import classification_report 
from sklearn.metrics import accuracy_score 

infile = 'Sentiment_Analysis_Dataset.csv' 
data = "SentimentText" 
labels = "Sentiment" 


class Classifier(): 
    def __init__(self): 
     self.train_set, self.test_set = self.load_data() 
     self.counts, self.test_counts = self.vectorize() 
     self.classifier = self.train_model() 

    def load_data(self): 

     df = pd.read_csv(infile, header=0, error_bad_lines=False) 
     train_set, test_set = train_test_split(df, test_size=.3) 
     return train_set, test_set 

    def train_model(self): 
     classifier = BernoulliNB() 
     targets = self.train_set[labels] 
     classifier.fit(self.counts, targets) 
     return classifier 


    def vectorize(self): 

     vectorizer = TfidfVectorizer(min_df=5, 
           max_df = 0.8, 
           sublinear_tf=True, 
           ngram_range = (1,2), 
           use_idf=True) 
     counts = vectorizer.fit_transform(self.train_set[data]) 
     test_counts = vectorizer.transform(self.test_set[data]) 

     return counts, test_counts 

    def evaluate(self): 
     test_counts,test_set = self.test_counts, self.test_set 
     predictions = self.classifier.predict(test_counts) 
     print (classification_report(test_set[labels], predictions)) 
     print ("The accuracy score is {:.2%}".format(accuracy_score(test_set[labels], predictions))) 


    def classify(self, input): 
     input_text = input 

     input_vectorizer = TfidfVectorizer(min_df=5, 
           max_df = 0.8, 
           sublinear_tf=True, 
           ngram_range = (1,2), 
           use_idf=True) 
     input_counts = input_vectorizer.transform(input_text) 
     predictions = self.classifier.predict(input_counts) 
     print(predictions) 

myModel = Classifier() 

text = ['I like this I feel good about it', 'give me 5 dollars'] 

myModel.classify(text) 
myModel.evaluate()

錯誤：

Traceback (most recent call last): 
    File "sentiment.py", line 74, in <module> 
    myModel.classify(text) 
    File "sentiment.py", line 66, in classify 
    input_counts = input_vectorizer.transform(input_text) 
    File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 1380, in transform 
    X = super(TfidfVectorizer, self).transform(raw_documents) 
    File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 890, in transform 
    self._check_vocabulary() 
    File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 278, in _check_vocabulary 
    check_is_fitted(self, 'vocabulary_', msg=msg), 
    File "/home/rachel/Sentiment/ENV/lib/python3.5/site-packages/sklearn/utils/validation.py", line 690, in check_is_fitted 
    raise _NotFittedError(msg % {'name': type(estimator).__name__}) 
sklearn.exceptions.NotFittedError: TfidfVectorizer - Vocabulary wasn't fitted.

我不知道這個問題可能是什麼。在我的分類方法中，我創建了一個全新的矢量化器來處理要分類的文本，與用於從模型創建訓練和測試數據的矢量化器分離。

感謝

來源

2017-05-26 killer_manatee

不管怎樣，在你的'classify'功能，創建一個新的矢量化的對象，然後調用'transform'以往任何時候都裝了。 –

添加到@AryaMcCarthy的答案，這個類中的整個分類功能是誤導性的。構造函數允許傳入輸入數據。那麼爲什麼在分類時再次通過它？ –

你裝一個矢量器，但你把它扔掉，因爲它不存在過去的vectorize功能的壽命。相反，保存您的模型vectorize它被改造後：

self._vectorizer = vectorizer

在 classify功能

然後，不創建一個新的矢量器。相反，使用你會裝到訓練數據的一個：

input_counts = self._vectorizer.transform(input_text)

來源

2017-05-26 05:09:14

NotFittedError：TfidfVectorizer - 詞彙未安裝

回答

相關問題