我有一個svm模型的預測（prediction_svm_linear），我想用R中的pROC包繪製ROC曲線。我得到AUC 100％，這是不可能的，因爲基於混淆矩陣我沒有完美的預測。顯然我錯過了一些東西，可能我不完全瞭解ROC曲線的工作原理，請你向我解釋爲什麼會發生這種情況？奇怪的ROC曲線預測

Confusion Matrix and Statistics 

     Reference 
Prediction Cancer Normal 
Cancer  11  0 
Normal  3  5 

      Accuracy : 0.8421   
      95% CI : (0.6042, 0.9662) 
No Information Rate : 0.7368   
P-Value [Acc > NIR] : 0.2227   

       Kappa : 0.6587   
Mcnemar's Test P-Value : 0.2482   

     Sensitivity : 0.7857   
     Specificity : 1.0000   
    Pos Pred Value : 1.0000   
    Neg Pred Value : 0.6250   
     Prevalence : 0.7368   
    Detection Rate : 0.5789   
    Detection Prevalence : 0.5789   
    Balanced Accuracy : 0.8929   

    'Positive' Class : Cancer

這裏是我的代碼：

library(pROC) 
    testData_class = c(rep(c("Normal", "Cancer"), c(5, 14))) 
    prediction_svm_linear = data.frame(Cancer = c(0.11766249, 0.04765463, 0.08749940, 0.01715765, 0.10755376, 0.28358435, 0.37478957, 0.90603193, 0.91077112, 0.68602820, 0.64783894, 0.67916187,0.38785763, 0.66440580, 0.51897036, 0.93484214, 0.91719866, 0.83239007, 0.63491027), Normal = c(0.88233751, 0.95234537, 0.91250060, 0.98284235, 0.89244624, 0.71641565, 0.62521043, 0.09396807, 0.08922888, 0.31397180, 0.35216106, 0.32083813,0.61214237, 0.33559420, 0.48102964, 0.06515786, 0.08280134, 0.16760993, 0.36508973)) 

    result.roc.model1 <- roc(testData$class, prediction_svm_linear$Cancer, 
          levels = rev(levels(testData$class))) 


>result.roc.model1 
Call: 
roc.default(response = testData$class, predictor = prediction.prob.b5_svm_linear$Cancer,  levels = rev(levels(testData$class))) 

Data: prediction.prob.b5_svm_linear$Cancer in 5 controls (testData$class Normal) < 14 cases (testData$class Cancer). 
Area under the curve: 1

來源

2016-03-17 Mati

基於混淆矩陣的閾值是多少？你能證明你是如何生成它的嗎？顯然ROC曲線告訴你有一個更好的閾值... – Calimo

我沒有設置混淆矩陣的任何閾值。這是代碼： 'confusionMatrix（testData_class，prediction_svm_linear）' – Mati

從插入符號包？ – Calimo

從您的意見，我會懷疑你濫用從caret包confusionMatrix功能。根據文件，第二個因素應該是「a factor of classes to be used as the true results」，而你的評論表明你通過data.frame連續預測。它不僅與所需格式不同，而且也應該是您的第一個參數。

你應該使用這樣的事情，而不是：

predictions <- ifelse(prediction_svm_linear$Cancer > 0.2, "Cancer", "Normal") 
confusionMatrix(predictions, testData_class)

來源

2016-03-18 08:26:29 Calimo

對不起，我可能已經讓你感到困惑，但這裏是所有的信息

二進制prefiction：

prediction_svm = c("Normal", "Normal", "Normal", "Normal", "Normal", "Normal", "Normal", "Cancer", "Cancer", "Cancer", "Cancer", "Cancer", "Normal", "Cancer", "Cancer", "Cancer", "Cancer", "Cancer", "Cancer")

基本事實：

個

testData_class = c(rep(c("Normal", "Cancer"), c(5, 14)))

概率預測

prediction_svm_linear.prob = data.frame(Cancer = c(0.11766249, 0.04765463, 0.08749940, 0.01715765, 0.10755376, 0.28358435, 0.37478957, 0.90603193, 0.91077112, 0.68602820, 0.64783894, 0.67916187,0.38785763, 0.66440580, 0.51897036, 0.93484214, 0.91719866, 0.83239007, 0.63491027), Normal = c(0.88233751, 0.95234537, 0.91250060, 0.98284235, 0.89244624, 0.71641565, 0.62521043, 0.09396807, 0.08922888, 0.31397180, 0.35216106, 0.32083813,0.61214237, 0.33559420, 0.48102964, 0.06515786, 0.08280134, 0.16760993, 0.36508973))

，我用這個命令建立混淆矩陣：

confusionMatrix(prediction_svm, testData$class)

library(pROC) 
    result.roc.model1 <- roc(testData$class, prediction_svm_linear.prob$Cancer, 
          levels = rev(levels(testData$class))) 


>result.roc.model1 
Call: 
roc.default(response = testData$class, predictor = prediction.prob.b5_svm_linear$Cancer,  levels = rev(levels(testData$class))) 

Data: prediction.prob.b5_svm_linear$Cancer in 5 controls (testData$class Normal) < 14 cases (testData$class Cancer). 
Area under the curve: 1 


>result.coords.model1 <- coords( result.roc.model1, "best", best.method="closest.topleft",ret=c("threshold", "accuracy")) 

>result.coords.model1

threshold accuracy 
0.2006234 1.0000000

來源

2016-03-18 12:56:54 Mati

奇怪的ROC曲線預測

回答

二進制prefiction：

基本事實：

概率預測

相關問題