2017-02-09 450 views
2

我正在與Keras NN與Theanos後端,我正在處理與14輸出類的分類問題。我想要預測的類加上相關的概率。問題是predict_proba()的概率似乎不符合predict()的預測類,下面是代碼加上1個樣本的結果輸出。Keras分類器predict_proba()不符合預測()

PPRANK = ['pp1', 'pp2', 'pp3', 'pp4', 'pp5', 'pp6', 'pp7', 'pp8', 'pp9', 'pp10', 'pp11', 'pp12', 'pp13', 'pp14', 'pp15'] 

FEATURES = (PPRANK) 

# fix random seed for reproducibility 
seed = 7 
np.random.seed(seed) 

data_df = pd.DataFrame.from_csv("data.csv") 
X = np.array(data_df[FEATURES].values) 
Y = (data_df["bres"].replace(14,13).values) 


# define baseline model 
def baseline_model(): 
    # create model 
    model = Sequential() 
    model.add(Dense(8, input_dim=(len(FEATURES)), init='normal', activation='relu')) 
    model.add(Dense(14, init='normal', activation='softmax')) 
    # Compile model 
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 
    return model 
#build model 
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0) 

#split train and test 
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, random_state=seed) 
estimator.fit(X_train, Y_train) 

#get probabilities 
predictions = estimator.predict_proba(X_test) 

#convert expon to floats 
probs = [[] for x in range(21)] 
tick2 = 0 
for i in range(len(predictions)): 
    tick = 0 
    for x in xrange(14): 
     (predictions[i][(tick)]) = '%.4f' % (predictions[i][(tick)]) 
     probs[(tick2)].append((predictions[i][(tick)])) 
     tick += 1 
    tick2 += 1 

# pprint probabilities 
pp = pprint.PrettyPrinter(indent=0) 
pp.pprint(probs) 

#print class predictions 
print estimator.predict(X_test) 
print Y_test 

概率

[0.00000,0.00030,0.02360,0.04329,0.00019,0.00069,0.00120,0.00030,0.00559,0.00410,0.00510,0.91549,0.0,0.0]

預測類

實際的類

它顯示12具有來自predict_proba()的最高概率,而不是來自predict()的11。感謝您的任何幫助。

回答

3

python數組(和這裏的類)的索引從0開始計數,而不是從1開始。再看一次,0.91是人們數數的第12個值,但它位於index = 11,因此predict和predict_proba是一致的

至於爲什麼不是13,預測可能是錯誤的(但檢查你沒有那種相同的錯誤)