sklearn GridSearchCV：如何獲取分類報告？

我使用GridSearchCV這樣的：sklearn GridSearchCV：如何獲取分類報告？

corpus = load_files('corpus') 

with open('stopwords.txt', 'r') as f: 
    stop_words = [y for x in f.read().split('\n') for y in (x, x.title())] 

x = corpus.data 

y = corpus.target 

pipeline = Pipeline([ 
    ('vec', CountVectorizer(stop_words=stop_words)), 
    ('classifier', MultinomialNB())]) 

parameters = {'vec__ngram_range': [(1, 1), (1, 2)], 
       'classifier__alpha': [1e-2, 1e-3], 
       'classifier__fit_prior': [True, False]} 

gs_clf = GridSearchCV(pipeline, parameters, n_jobs=-1, cv=5, scoring="f1", verbose=10) 

gs_clf = gs_clf.fit(x, y) 

joblib.dump(gs_clf.best_estimator_, 'MultinomialNB.pkl', compress=1)

然後，在另一個文件中，新的文件進行分類（不是從陰莖），我這樣做：

classifier = joblib.load(filepath) # path to .pkl file 
    result = classifier.predict(tokenlist)

我的問題是：在哪裏我得到了classification_report所需的值？

在許多其他的例子中，我看到人們將語料庫分成traing集和測試集。但是，由於我使用了使用kfold-cross-validation的GridSearchCV，因此我不需要那樣做。那麼如何從GridSearchCV獲得這些值？

來源

2016-11-15 user3813234

只是一個問題，不'gs_clf.fit（X，Y ）'return'None'？ – BallpointBen

@BallpointBen爲什麼會這樣？ x和y包含數據 – user3813234

最好的模型是在clf.best_estimator_。你需要將訓練數據擬合到這裏;然後預測您的測試數據並使用ytest和ypred作爲分類報告。

來源

2016-11-15 20:41:22 simon

感謝您的回覆！所以要明確一點：對於GridSearchCV，我使用所有數據（在我的情況下是corpus，data和corpus.target），但對於最好的分類器，我使用train_test_split將數據分爲x_test，X_train，Y_test和Y_train。 – user3813234

是的。如果您希望得分可靠，那麼他們需要根據與用於擬合的集合不同的一組數據進行測量。 – simon

或者如果您有足夠的數據，您可以在進行網格搜索之前拆分數據。 – simon

如果你有GridSearchCV對象：

from sklearn.metrics import classification_report 
clf = GridSearchCV(....) 
clf.fit(x_train, y_train) 
classification_report(clf.best_estimator_.predict(x_test), y_test)

如果你已經保存最好的估計，並加載它，然後：

classifier = joblib.load(filepath) 
classification_report(classifier.predict(x_test), y_test)

來源

2017-12-12 21:23:27

sklearn GridSearchCV：如何獲取分類報告？

回答

相關問題