2015-06-30 226 views
1

我明白ROCtprfpr之間畫,但我在決定哪些參數我應該改變,以獲得不同的tpr /fpr雙有困難。ROC隨機森林

回答

1

我在類似的問題上寫了這個answer

基本上,您可以增加某些類的權重和/或對其他類進行降採樣和/或更改投票聚合規則。

[EDITED 13.15PM CEST 2015年7月1日] @「這兩個類都非常平衡 - Suryavansh」

在你的數據是平衡的,你應該主要去選擇3個這樣的情況下(改變聚合規則) 。在randomForest中,可以在訓練或預測時使用截斷參數進行訪問。在其他設置中,您可能需要自己從所有樹中提取所有交叉驗證的投票,應用一系列規則並計算結果的fpr和fnr。

library(randomForest) 
library(AUC) 

#some balanced data generator 
make.data = function(obs=5000,vars=6,noise.factor = .4) { 
    X = data.frame(replicate(vars,rnorm(obs))) 
    yValue = with(X,sin(X1*pi)+sin(X2*pi*2)^3+rnorm(obs)*noise.factor) 
    yClass = (yValue<median(yValue))*1 
    yClass = factor(yClass,labels=c("red","green")) 
    print(table(yClass)) #five classes, first class has 1% prevalence only 
    Data=data.frame(X=X,y=yClass) 
} 

#plot true class separation 
Data = make.data() 
par(mfrow=c(1,1)) 
plot(Data[,1:2],main="separation problem: predict red/green class", 
    col = c("#FF000040","#00FF0040")[as.numeric(Data$y)]) 

enter image description here

#train default RF 
rf1 = randomForest(y~.,Data) 
#you can choose a given threshold from this ROC plot 
plot(roc(rf1$votes[,1],rf1$y),main="chose a threshold from") 

enter image description here

#create at testData set from same generator 
testData = make.data() 


#predict with various cutoff's 
predTable = data.frame(
    trueTest = testData$y, 
    majorityVote = predict(rf1,testData), 
    #~3 times increase false red 
    Pred.alot.Red = factor(predict(rf1,testData,cutoff=c(.3,.1))), 
    #~3 times increase false green 
    Pred.afew.Red = factor(predict(rf1,testData,cutoff=c(.1,.3))) 
) 

#see confusion tables 
table(predTable[,c(1,2)])/5000 
     majorityVote 
trueTest red green 
    red 0.4238 0.0762 
    green 0.0818 0.4182 

table(predTable[,c(1,3)])/5000 
     Pred.alot.Red 
trueTest red green 
    red 0.2902 0.2098 
    green 0.0158 0.4842 

table(predTable[,c(1,4)])/5000 
     Pred.afew.Red 
trueTest red green 
    red 0.4848 0.0152 
    green 0.2088 0.2912 

+0

這兩類很平衡 – Suryavanshi