我正在使用mlr和其他軟件包進行生存分析。在mlr中,我使用了surv.rpart和surv.glmboost。我也使用原始軟件包rpart和mboost來做到這一點。我發現他們的結果是不同的。請看看下面的例子:如何在使用mlr和使用其他軟件包(如rpart和mboost中的R)時插入不同的結果
> myData2 <- data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2),
DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19),
Status='TRUE')
> myData2$Status <- as.logical(myData2$Status)
> myTrain <- c(1:(nrow(myData2)-1))
> myTest <- nrow(myData2)
當我在MLR使用surv.rpart,其結果是:
> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status'))
> surv.lrn <- makeLearner("surv.rpart")
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain)
> surv.pred <- predict(mod,task=surv.task,subset=myTest)
> surv.pred
Prediction: 1 observations
predict.type: response
threshold:
time: 0.00
id truth.time truth.event response
11 11 19 TRUE 1
如果我使用原來的軟件rpart包,結果是:
> train <- myData2[1:(nrow(myData2)-1),]
> test <- myData2[nrow(myData2),]
> fit <- rpart(DaysDiff~DaySum,data=train)
> predict(fit,newdata=test)
[1] 26.9
我怎麼得到兩個不同的結果?它看起來像rpart包直接給了我想要的結果,而來自mlr的結果有一些轉換。同樣的事情發生時,我使用surv.glmboost:
> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status'))
Warning messages:
1: Unknown or uninitialised column: 'Weibull'.
2: Unknown or uninitialised column: 'Cox'.
3: Unknown or uninitialised column: 'Month2'.
4: Unknown or uninitialised column: 'Month2'.
5: Unknown or uninitialised column: 'Month'.
6: Unknown or uninitialised column: 'Month'.
7: Unknown or uninitialised column: 'MonthsDiff'.
8: Unknown or uninitialised column: 'Weibull'.
9: Unknown or uninitialised column: 'Cox'.
> surv.lrn <- makeLearner("surv.glmboost")
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain)
Warning message:
In names(data) != all.vars(formula[[2]]) :
longer object length is not a multiple of shorter object length
> surv.pred <- predict(mod,task=surv.task,subset=myTest)
> surv.pred
Prediction: 1 observations
predict.type: response
threshold:
time: 0.00
id truth.time truth.event response
11 11 19 TRUE -0.1946239
下面是使用mboost包結果:
> train <- myData2[1:(nrow(myData2)-1),]
Warning messages:
1: Unknown or uninitialised column: 'Weibull'.
2: Unknown or uninitialised column: 'Cox'.
3: Unknown or uninitialised column: 'Month2'.
4: Unknown or uninitialised column: 'Month2'.
5: Unknown or uninitialised column: 'Month'.
6: Unknown or uninitialised column: 'Month'.
7: Unknown or uninitialised column: 'MonthsDiff'.
8: Unknown or uninitialised column: 'Weibull'.
9: Unknown or uninitialised column: 'Cox'.
> test <- myData2[nrow(myData2),]
> fit <- glmboost(DaysDiff~DaySum,data=train)
> predict(fit,newdata=test)
[,1]
[1,] 33.08294
這是我發現至今。這可能發生在像surv.cforest這樣的其他功能上。我的問題是:爲什麼會發生這種情況?當我使用mlr包時,如何獲得像rpart和mboost這樣的結果?
可能是因爲他們使用不同的參數.. –