1
您好,我正在嘗試將文本分類爲4個類別,我想打印以及預測,文本屬於每個類別的概率。
閱讀文檔後Scikit學習,我想我應該用predict_proba
, 到目前爲止我的代碼是這樣的:Scikit-learn獲得屬於某個類別的示例的可預測性
# -*- coding: utf-8 -*-
#!/usr/bin/env python
import sys
import os
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.datasets import load_files
from sklearn.svm import SVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
string = sys.argv[1] #i will pass text to predict from console
sets = load_files('scikit') #load training set
count_vect = CountVectorizer(analyzer='char_wb', ngram_range=(0, 3), min_df=1)
X_train_counts = count_vect.fit_transform(sets.data)
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = MultinomialNB().fit(X_train_tfidf, sets.target)
docs_new = [string]
X_new_counts = count_vect.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)
predicted = clf.predict(X_new_tfidf)
for doc, category in zip(docs_new, predicted):
print('%r => %s' % (doc, sets.target_names[category])) #print prediction , and it is correct
print(clf.predict_proba(sets.target_names)) #trying to get prob for al classes
可悲的輸出是這樣的:ValueError: objects are not aligned
,我已經嘗試了不同的方式來實現這一點很多並在網上搜索很多,但似乎沒有工作。 任何意見將不勝感激。謝謝 Nico。
_錯誤發生在哪裏?在安裝'MNB'分類器或其他地方?如果是這樣,什麼樣的對象是'sets.target'? – tttthomasssss
你會得到clf.predict_proba(X_new_tfidf) – Stergios
@Stergios正確的概率,隨意張貼作爲答案, –