2017-08-10 58 views
1

我正在使用mlr和其他軟件包進行生存分析。在mlr中,我使用了surv.rpart和surv.glmboost。我也使用原始軟件包rpart和mboost來做到這一點。我發現他們的結果是不同的。請看看下面的例子:如何在使用mlr和使用其他軟件包(如rpart和mboost中的R)時插入不同的結果

> myData2 <- data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2), 
         DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19), 
         Status='TRUE') 
> myData2$Status <- as.logical(myData2$Status) 
> myTrain <- c(1:(nrow(myData2)-1)) 
> myTest <- nrow(myData2) 

當我在MLR使用surv.rpart,其結果是:

> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status')) 
> surv.lrn <- makeLearner("surv.rpart") 
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain) 
> surv.pred <- predict(mod,task=surv.task,subset=myTest) 
> surv.pred 
Prediction: 1 observations 
predict.type: response 
threshold: 
time: 0.00 
    id truth.time truth.event response 
11 11   19  TRUE  1 

如果我使用原來的軟件rpart包,結果是:

> train <- myData2[1:(nrow(myData2)-1),] 
> test <- myData2[nrow(myData2),] 
> fit <- rpart(DaysDiff~DaySum,data=train) 
> predict(fit,newdata=test) 
[1] 26.9 

我怎麼得到兩個不同的結果?它看起來像rpart包直接給了我想要的結果,而來自mlr的結果有一些轉換。同樣的事情發生時,我使用surv.glmboost:

> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status')) 
Warning messages: 
1: Unknown or uninitialised column: 'Weibull'. 
2: Unknown or uninitialised column: 'Cox'. 
3: Unknown or uninitialised column: 'Month2'. 
4: Unknown or uninitialised column: 'Month2'. 
5: Unknown or uninitialised column: 'Month'. 
6: Unknown or uninitialised column: 'Month'. 
7: Unknown or uninitialised column: 'MonthsDiff'. 
8: Unknown or uninitialised column: 'Weibull'. 
9: Unknown or uninitialised column: 'Cox'. 
> surv.lrn <- makeLearner("surv.glmboost") 
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain) 
Warning message: 
In names(data) != all.vars(formula[[2]]) : 
    longer object length is not a multiple of shorter object length 
> surv.pred <- predict(mod,task=surv.task,subset=myTest) 
> surv.pred 
Prediction: 1 observations 
predict.type: response 
threshold: 
time: 0.00 
    id truth.time truth.event response 
11 11   19  TRUE -0.1946239 

下面是使用mboost包結果:

> train <- myData2[1:(nrow(myData2)-1),] 
Warning messages: 
1: Unknown or uninitialised column: 'Weibull'. 
2: Unknown or uninitialised column: 'Cox'. 
3: Unknown or uninitialised column: 'Month2'. 
4: Unknown or uninitialised column: 'Month2'. 
5: Unknown or uninitialised column: 'Month'. 
6: Unknown or uninitialised column: 'Month'. 
7: Unknown or uninitialised column: 'MonthsDiff'. 
8: Unknown or uninitialised column: 'Weibull'. 
9: Unknown or uninitialised column: 'Cox'. 
> test <- myData2[nrow(myData2),] 
> fit <- glmboost(DaysDiff~DaySum,data=train) 
> predict(fit,newdata=test) 
     [,1] 
[1,] 33.08294 

這是我發現至今。這可能發生在像surv.cforest這樣的其他功能上。我的問題是:爲什麼會發生這種情況?當我使用mlr包時,如何獲得像rpart和mboost這樣的結果?

+0

可能是因爲他們使用不同的參數.. –

回答

1

你的問題是,你不適合rpart和glmboost的生存模型,而是一個簡單的迴歸模型。

相適應rpart包一個生存模式是這樣的:

fit = rpart(Surv(DaysDiff, event = Status) ~ DaySum,data=train, method = "exp") 
predict(fit,newdata=test) 

所以比較完整代碼給出相同的結果(每一個預測1):

library(mlr) 
myData2 = data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2), 
    DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19), 
    Status='TRUE') 
myData2$Status = as.logical(myData2$Status) 
train = myData2[1:(nrow(myData2)-1),] 
test = myData2[nrow(myData2),] 
surv.task = makeSurvTask(data=train,target=c('DaysDiff','Status')) 
surv.lrn = makeLearner("surv.rpart") 
mod = train(learner=surv.lrn,task=surv.task,subset=myTrain) 
surv.pred = predict(mod,newdata = test) 
surv.pred 
library(rpart) 
library(survival) 
fit = rpart(Surv(DaysDiff, event = Status) ~ DaySum,data=train, method = "exp") 
predict(fit,newdata=test) 
+0

這太好了。非常感謝。 –

相關問題