如何獲得使用隨機森林的課程的重要性？

我在我的數據集中使用randomForest包進行分類，但使用importance命令我只獲得變量的重要性。那麼，如果我想通過特定類別的變量來獲得變量重要性？就像區域變量中的特定位置一樣，該區域對總量的影響程度如何。我在變壓器中想到了每個班都是假人，但我不知道這是否是一個好主意。如何獲得使用隨機森林的課程的重要性？

來源

2015-12-03 Lucas Meireles

我認爲你的意思是「變量對特定類別變量的重要性」。這還沒有實施，但我想這將是可能的，有意義的，也許有用。對於只有兩類的變量來說，這當然沒有意義。

我會執行它類似於：列車模型 - >計算袋外預測性能（OOB-cv1） - >按特定變量排列特定類別（將此類別隨機重新分配給其他類別，由其他類別加權 - >重新計算袋外預測性能（OOB-cv2） - >從OOB-cv2中減去OOB-cv1

然後我寫了一個函數實現分類特定的變量重要性。

library(randomForest) 

#Create some classification problem, with mixed categorical and numeric vars 
#Cat A of var 1, cat B of var 2 and Cat C of var 3 influence class the most. 
X.cat = replicate(3,sample(c("A","B","C"),600,rep=T)) 
X.val = replicate(2,rnorm(600)) 
y.cat = 3*(X.cat[,1]=="A") + 3*(X.cat[,2]=="B") + 3*(X.cat[,3]=="C") 
y.cat.err = y.cat+rnorm(600) 
y.lim = quantile(y.cat.err,c(1/3,2/3)) 
y.class = apply(replicate(2,y.cat.err),1,function(x) sum(x>y.lim)+1) 
y.class = factor(y.class,labels=c("ann","bob","chris")) 
X.full = data.frame(X.cat,X.val) 
X.full[1:3] = lapply(X.full[1:3],as.factor) 

#train forest 
rf=randomForest(X.full,y.class,keep.inbag=T,replace=T) 

#make function to compute crovalidated classification error 
oobErr = function(rf,X) { 
    preds = predict(rf,X,type="vote",predict.all = T)$individual 
    preds[rf$inbag!=0]=NA 
    oob.pred = apply(preds,1,function(x) { 
    tabx=sort(table(x),dec=T) 
    majority.vote = names(tabx)[1] 
    }) 
    return(mean(as.character(rf$y)!=oob.pred)) 
} 

#make function to iterate all categories of categorical variables 
#and compute change of OOB class error due to permutation of category 
catVar = function(rf,X,nPerm=2) { 
    ref = oobErr(rf,X) 
    catVars = which(rf$forest$ncat>1) 
    lapply(catVars, function(iVar) { 
    catImp = replicate(nPerm,{ 
     sapply(levels(X[[iVar]]), function(thisCat) { 
     thisCat.ind = which(thisCat==X[[iVar]]) 
     X[thisCat.ind,iVar] = head(sample(X[[iVar]]),length(thisCat.ind)) 
     varImp = oobErr(rf,X)-ref 
     }) 
    }) 
    if(nPerm==1) catImp else apply(catImp,1,mean) 
    }) 
} 

#try it out 
out = catVar(rf,X.full,nPerm=4) 
print(out) #seems like it works as it should 

$X1 
     A  B  C 
0.14000 0.07125 0.06875 

$X2 
     A   B   C 
0.07458333 0.16083333 0.07666667 

$X3 
     A   B   C 
0.05333333 0.08083333 0.15375000

來源

2015-12-04 10:49:19

這是一個好主意，但仍然不完全是我想要的。我想在報告中使用，所以我需要每個類別的數字。將極大地促進人們對各類影響的理解。 –

好吧，你可以重複上面的模式爲每個變量的每個類別 –

寫了一個R實現計算分類特定變量的重要性 –

如何獲得使用隨機森林的課程的重要性？

回答

相關問題