混淆矩陣原始數據不匹配

我創建了一個混淆矩陣，它的工作原理沒有問題，但它的原始碼似乎沒有與標籤連接。混淆矩陣原始數據不匹配

我有一個字符串的一些列表，它被分裂成火車和測試部分：

train + test: 
positive: 16 + 4 = 20 
negprivate: 53 + 14 = 67 
negstratified: 893 + 224 = 1117

混淆矩陣是建立在測試數據：

[[ 0 14 0] 
[ 3 220 1] 
[ 0 4 0]]

下面是代碼：

my_tags = ['negprivate', 'negstratified', 'positive'] 

def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues): 
    logging.info('plot_confusion_matrix') 
    plt.imshow(cm, interpolation='nearest', cmap=cmap) 
    plt.title(title) 
    plt.colorbar() 
    tick_marks = np.arange(len(my_tags)) 
    target_names = my_tags 
    plt.xticks(tick_marks, target_names, rotation=45) 
    plt.yticks(tick_marks, target_names) 
    plt.tight_layout() 
    plt.ylabel('True label') 
    plt.xlabel('Predicted label') 
    plt.show() 

def evaluate_prediction(target, predictions, taglist, title="Confusion matrix"): 
    logging.info('Evaluate prediction') 
    print('accuracy %s' % accuracy_score(target, predictions)) 
    cm = confusion_matrix(target, predictions) 
    print('confusion matrix\n %s' % cm) 
    print('(row=expected, col=predicted)') 
    print 'rows: \n %s \n %s \n %s ' % (taglist[0], taglist[1], taglist[2]) 

    cm_normalized = cm.astype('float')/cm.sum(axis=1)[:, np.newaxis] 
    plot_confusion_matrix(cm_normalized, title + ' Normalized')

...

test_targets, test_regressors = zip(
    *[(doc.tags[0], doc2vec_model.infer_vector(doc.words, steps=20)) for doc in alltest]) 
logreg = linear_model.LogisticRegression(n_jobs=1, C=1e5) 
logreg = logreg.fit(train_regressors, train_targets) 
evaluate_prediction(test_targets, logreg.predict(test_regressors), my_tags, title=str(doc2vec_model))

但問題是，我實際上必須查看結果矩陣中的數字並更改my_tags的順序，以使它們可以相互一致。據我所知，這應該以某種自動方式進行。其中，我想知道？

來源

2016-09-23 Talka

總是有最好的整數類標籤，一切似乎運行得更順暢。你可以使用這些LabelEncoder，即

from sklearn import preprocessing 
my_tags = ['negprivate', 'negstratified', 'positive'] 
le = preprocessing.LabelEncoder() 
new_tags = le.fit_transform(my_tags)

所以，現在你將有[0 1 2]作爲新的標籤。當你做你的繪圖，你希望你的標籤，以直觀，所以你可以用inverse_transform讓你的標籤，即

le.inverse_transform(0)

輸出：

'negprivate'

來源

2016-09-23 11:27:49 ncfirth

謝謝，我還沒有聽說過LabelEncoder，我會試試看。 – Talka

我認爲這只是您的標籤的排序順序，即np.unique(target)的輸出。

來源

2016-09-23 05:20:06 maxymoo

我想這發生在linetest_targets，test_regressors = zip（* [（doc.tags [0]，doc2vec_model.infer_vector（doc.words，steps = 20））for doc in alltest]）l – Talka

我的意思是如果第一句話的標籤'negstratified'比這個標籤是矩陣中的第一個原始數據等等。我無法理解如何使用代碼操作訂單。 – Talka

混淆矩陣原始數據不匹配

回答

相關問題