我知道高斯過程模型最適合迴歸而不是分類。但是,我仍然希望將高斯過程應用於分類任務,但我不確定將模型生成的預測分類爲最佳方式。我已審閱高斯過程分類例如可用的scikit學習網站上:如何使用高斯過程進行二進制分類?
但我發現這個例子混淆(我列出我發現這個例子令人困惑的事情,在結束問題)。爲了嘗試,並獲得更好的理解我一直在使用scikit學習,通過應用決策邊界由高斯過程做出的預測產生分類創造了一個非常基本的Python代碼示例:
#A minimum example illustrating how to use a
#Gaussian Processes for binary classification
import numpy as np
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.gaussian_process import GaussianProcess
if __name__ == "__main__":
#defines some basic training and test data
#If the descriptive features have large values
#(i.e., 8s and 9s) the target is 1
#If the descriptive features have small values
#(i.e., 2s and 3s) the target is 0
TRAININPUTS = np.array([[8, 9, 9, 9, 9],
[9, 8, 9, 9, 9],
[9, 9, 8, 9, 9],
[9, 9, 9, 8, 9],
[9, 9, 9, 9, 8],
[2, 3, 3, 3, 3],
[3, 2, 3, 3, 3],
[3, 3, 2, 3, 3],
[3, 3, 3, 2, 3],
[3, 3, 3, 3, 2]])
TRAINTARGETS = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
TESTINPUTS = np.array([[8, 8, 9, 9, 9],
[9, 9, 8, 8, 9],
[3, 3, 3, 3, 3],
[3, 2, 3, 2, 3],
[3, 2, 2, 3, 2],
[2, 2, 2, 2, 2]])
TESTTARGETS = np.array([1, 1, 0, 0, 0, 0])
DECISIONBOUNDARY = 0.5
#Fit a gaussian process model to the data
gp = GaussianProcess(theta0=10e-1, random_start=100)
gp.fit(TRAININPUTS, TRAINTARGETS)
#Generate a set of predictions for the test data
y_pred = gp.predict(TESTINPUTS)
print "Predicted Values:"
print y_pred
print "----------------"
#Convert the continuous predictions into the classes
#by splitting on a decision boundary of 0.5
predictions = []
for y in y_pred:
if y > DECISIONBOUNDARY:
predictions.append(1)
else:
predictions.append(0)
print "Binned Predictions (decision boundary = 0.5):"
print predictions
print "----------------"
#print out the confusion matrix specifiy 1 as the positive class
cm = confusion_matrix(TESTTARGETS, predictions, [1, 0])
print "Confusion Matrix (1 as positive class):"
print cm
print "----------------"
print "Classification Report:"
print metrics.classification_report(TESTTARGETS, predictions)
當我運行這段代碼我得到以下輸出:
Predicted Values:
[ 0.96914832 0.96914832 -0.03172673 0.03085167 0.06066993 0.11677634]
----------------
Binned Predictions (decision boundary = 0.5):
[1, 1, 0, 0, 0, 0]
----------------
Confusion Matrix (1 as positive class):
[[2 0]
[0 4]]
----------------
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 4
1 1.00 1.00 1.00 2
avg/total 1.00 1.00 1.00 6
在這個基本示例中使用的方法似乎適用於這個簡單的數據集。但是,這種方法是從我上面提到的(URL贅述)的scikit貧網站上給出的分類例子非常不同:
所以我想知道如果我在這裏失去了一些東西。
1.1解釋在這個例子中生成的內容的概率是概率:在scikit學習網站上給出
關於分類例如:所以,如果任何人都可以我將不勝感激?他們是否屬於類> 0的查詢實例的概率?
1.2爲什麼示例使用累積密度函數而不是概率密度函數?
1.3爲什麼在輸入到累積密度函數之前,示例將模型所做的預測除以均方誤差的平方根?
對於我在此列出的基本代碼示例,說明是否將簡單的決策邊界應用於由高斯過程模型生成的預測是進行二元分類的適當方法?
對不起,這麼長的問題,並感謝您的任何幫助。