我正在解決多類分類問題並嘗試使用通用增強模型(R中的gbm包)。我遇到的問題:插入符號train
函數method="gbm"
似乎不能正確處理多類數據。下面介紹一個簡單的例子。gbm方法用於多類別分類的脫字符號的使用
library(gbm)
library(caret)
data(iris)
fitControl <- trainControl(method="repeatedcv",
number=5,
repeats=1,
verboseIter=TRUE)
set.seed(825)
gbmFit <- train(Species ~ ., data=iris,
method="gbm",
trControl=fitControl,
verbose=FALSE)
gbmFit
輸出是
+ Fold1.Rep1: interaction.depth=1, shrinkage=0.1, n.trees=150
predictions failed for Fold1.Rep1: interaction.depth=1, shrinkage=0.1, n.trees=150
- Fold1.Rep1: interaction.depth=1, shrinkage=0.1, n.trees=150
+ Fold1.Rep1: interaction.depth=2, shrinkage=0.1, n.trees=150
...
+ Fold5.Rep1: interaction.depth=3, shrinkage=0.1, n.trees=150
predictions failed for Fold5.Rep1: interaction.depth=3, shrinkage=0.1, n.trees=150
- Fold5.Rep1: interaction.depth=3, shrinkage=0.1, n.trees=150
Aggregating results
Selecting tuning parameters
Fitting interaction.depth = numeric(0), n.trees = numeric(0), shrinkage = numeric(0) on full training set
Error in if (interaction.depth < 1) { : argument is of length zero
然而,如果我嘗試使用GBM沒有插入符號的包裝,我得到很好的結果。
set.seed(1365)
train <- createDataPartition(iris$Species, p=0.7, list=F)
train.iris <- iris[train,]
valid.iris <- iris[-train,]
gbm.fit.iris <- gbm(Species ~ ., data=train.iris, n.trees=200, verbose=FALSE)
gbm.pred <- predict(gbm.fit.iris, valid.iris, n.trees=200, type="response")
gbm.pred <- as.factor(colnames(gbm.pred)[max.col(gbm.pred)]) ##!
confusionMatrix(gbm.pred, valid.iris$Species)$overall
FYI,由##!
標記行代碼轉換由predict.gbm
返回到最可能的類的類因子概率的矩陣。輸出是
Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull AccuracyPValue McnemarPValue
9.111111e-01 8.666667e-01 7.877883e-01 9.752470e-01 3.333333e-01 8.467252e-16 NaN
任何建議如何使gbm在多類數據上正確工作插入符號?
UPD:
sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-1 class_7.3-5 gbm_2.0-8 survival_2.36-14 caret_5.15-61 reshape2_1.2.2 plyr_1.8
[8] lattice_0.20-13 foreach_1.4.0 cluster_1.14.3 compare_0.2-3
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.3 grid_2.15.3 iterators_1.0.6 stringr_0.6.2 tools_2.15.3
只是一個問題,你爲什麼要使用2個不同的種子? 825和1365? – agstudy 2013-03-23 10:49:25
重要嗎? 825 - 是我採取的一個示例代碼的種子[caret.r-forge.r-project.org](http://caret.r-forge.r-project.org/training.html),1365 - 種子我用在我的項目中。 – maruan 2013-03-23 12:14:21