用兩種不同的方式計算randomforest訓練集的AUC給我不同的結果？

我使用了兩種方法來計算randomForest上訓練集的AUC，但我得到了非常不同的結果。這兩種方法如下：計算車組的AUC的用兩種不同的方式計算randomforest訓練集的AUC給我不同的結果？

`rf_p_train <- predict(rfmodel, type="prob",newdata = train)[,'yes'] 
rf_pr_train <- prediction(rf_p_train, train$y) 
r_auc_train[i] <- performance(rf_pr_train, measure = "auc")@y.values[[1]] `

方式二：計算車組的AUC的

rfmodel <- randomForest(y~., data=train, importance=TRUE, ntree=1000)

路1
rf_p_train <- as.vector(rfmodel$votes[,2]) rf_pr_train <- prediction(rf_p_train, train$y) r_auc_train[i] <- performance(rf_pr_train, measure = "auc")@y.values[[1]]

路1給我AUC大約爲1，但方式2使AUC在0.65附近。我想知道爲什麼這兩個結果差異如此之大。任何人都可以幫助我嗎？真的很感激它。對於這些數據，我很抱歉，我不能在這裏分享它。這是我第一次在這裏提問。如果有什麼不清楚的地方，請原諒我。非常感謝！

來源

2017-10-07 annadai

我不確定你使用的是什麼數據。最好如果你提供一個可重複的例子，但我認爲我能夠一塊拼成一塊

library(randomForest) 
#install.packages("ModelMetrics") 
library(ModelMetrics) 

# prep training to binary outcome 
train <- iris[iris$Species %in% c('virginica', 'versicolor'),] 
train$Species <- droplevels(train$Species) 

# build model 
rfmodel <- randomForest(Species~., data=train, importance=TRUE, ntree=2) 

# generate predictions 
preds <- predict(rfmodel, type="prob",newdata = train)[,2] 

# Calculate AUC 
auc(train$Species, preds) 

# Calculate LogLoss 
logLoss(train$Species, preds)

來源

2017-10-07 15:48:36 JackStat

謝謝！但是我的問題還沒有解決。你可以嘗試用以下兩種方法來計算訓練數據的AUC嗎？ 1.'rf_p_train < - predict（rfmodel，type =「prob」，newdata = train）[，2]; rf_pr_train < - 預測（rf_p_train，train $物種）; （rf_pr_train，measure =「auc」）@ y.values [[1]]'2. rf_p_train < - as.vector（rfmodel $ votes [，2]）; rf_pr_train < - 預測（rf_p_train，train $物種）; r_auc_train [i] < - performance（rf_pr_train，measure =「auc」）@ y.values [[1]]' 他們會給我們兩種不同的AUC，第一種高於第二種。 – annadai

對不起，我不熟悉如何使用堆棧溢出，真的需要幫助。非常感謝！ – annadai

用兩種不同的方式計算randomforest訓練集的AUC給我不同的結果？

回答

相關問題