2017-08-11 49 views
-1

時,你得到的類名後面我有一個CSV文件看起來像這樣:我如何使用MultiLabelBinarizer

target,data 
AAA,some text document 
AAA;BBB,more text 
AAC,more text 

下面是代碼:

from sklearn.multiclass import OneVsRestClassifier 
from sklearn.preprocessing import MultiLabelBinarizer 
from sklearn.feature_extraction.text import HashingVectorizer 
from sklearn.naive_bayes import BernoulliNB 
import pandas as pd 

pdf = pd.read_csv("Train.csv", sep=',') 
pdfT = pd.read_csv("Test.csv", sep=',') 

X1 = pdf['data'] 
Y1 = [[t for t in tar.split(';')] for tar in pdf['target']] 
X2 = pdfT['data'] 
Y2 = [[t for t in tar.split(';')] for tar in pdfT['target']] 

# Vectorizer data 
hv = HashingVectorizer(stop_words='english', non_negative=True) 
X1 = hv.transform(X1) 
X2 = hv.transform(X2) 

mlb = MultiLabelBinarizer() 
mlb.fit(Y1+Y2) 
Y1 = mlb.transform(Y1) 
# mlb.classes_ looks like ['AAA','AAC','BBB',...] len(mlb.classes_)==1363 

# Y1 looks like [[0,0,0,....0,0,0], ... ] now 

# fit 
clsf = OneVsRestClassifier(BernoulliNB(alpha=.001)) 
clsf.fit(X1,Y1) 

# predict_proba 
proba = clsf.predict_proba(X2) 

# want to get class names back 
classnames = mlb.inverse_transform(clsf.classes_) # booom, shit happens 

for i in range(len(proba)): 
    # get classnames,probability dict 
    preDict = dict(zip(classnames, proba[i])) 
    # sort dict by probability value, print actual and top 5 predict results 
    print(Y2[i], dict(sorted(preDict.items(),key=lambda d:d[1],reverse=True)[0:5])) 

問題是clsf.fit後( X1,Y1) clsf.classes_是一個int數組[0,1,2,3,... 1362]

爲什麼它不像Y1?我如何從clsf.classes_獲取類名? mlb.classes_ == clsf.classes_或不是,具有相同的順序?

回答

0

當你適應OneVsRestClassifier與多個標籤一LabelBinarizerfit通話過程中被調用,這將在multilabels轉化爲每個類獨特的標籤。

您可以訪問clsf對象的label_binarizer_屬性,該屬性的類別屬性將包含調用clsf的類中的類定義。

+0

謝謝! 'label_binarizer_'正是我需要的 'bitarray = clsf.label_binarizer_.inverse_transform(PROBA,閾值= 0.5)' 然後 '類名= mlb.inverse_transform(bitarray)' 但clsf.predict_proba(X2)似乎對回報的概率例如,每個二進制文件 – Leowan

+0

,即'[('AAA','BBB',)]','firstResut = mlb.inverse_transform(np.array [bitarray [0]])',我如何獲得每個標籤的概率? – Leowan