2014-06-16 59 views
5

使用bayesglm時,我遇到預測函數的一些問題。我讀過一些文章,指出當樣本數據比樣本數據更多時,可能會出現這個問題,但我使用相同的數據來擬合和預測函數。 Predict可以正常工作,但不適用於bayesglm。示例:貝葉斯預測,下標越界

control <- y ~ x1 + x2 

# this works fine: 
glmObject <- glm(control, myData, family = binomial()) 
predicted1 <- predict.glm(glmObject , myData, type = "response") 

# this gives an error: 
bayesglmObject <- bayesglm(control, myData, family = binomial()) 
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response") 
Error in X[, piv, drop = FALSE] : subscript out of bounds 

# Edit... I just discovered this works. 
# Should I be concerned about using these results? 
# Not sure why is fails when I specify the dataset 
predicted3 <- predict(bayesglmObject, type = "response") 

無法弄清楚如何使用bayesglm對象進行預測。有任何想法嗎?謝謝!

回答

2

其中一個原因可能是Bayesglm命令中參數「drop.unused.levels」的默認設置。默認情況下,該參數設置爲TRUE。所以如果有未使用的級別,它會在建模過程中被丟棄。但是,預測函數仍然使用原始數據和因子變量中存在的未使用的水平。這會導致用於模型構建的數據與用於預測的數據之間的級別差異(即使它是相同的數據名稱 - 在您的情況下,myData)。我給下面的例子:

n <- 100 
    x1 <- rnorm (n) 
    x2 <- as.factor(sample(c(1,2,3),n,replace = TRUE)) 

    # Replacing 3 with 2 makes the level = 3 as unused 
    x2[x2==3] <- 2 

    y <- as.factor(sample(c(1,2),n,replace = TRUE)) 

    myData <- data.frame(x1 = x1, x2 = x2, y = y) 
    control <- y ~ x1 + x2 

    # this works fine: 
    glmObject <- glm(control, myData, family = binomial()) 
    predicted1 <- predict.glm(glmObject , myData, type = "response") 

    # this gives an error - this uses default drop.unused.levels = TRUE 
    bayesglmObject <- bayesglm(control, myData, family = binomial()) 
    predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response") 

    Error in X[, piv, drop = FALSE] : subscript out of bounds 

    # this works fine - value of drop.unused.levels is set to FALSE 
    bayesglmObject <- bayesglm(control, myData, family = binomial(),drop.unused.levels = FALSE) 
    predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response") 

我認爲更好的辦法是使用droplevels事先從數據幀丟棄不用的水平,並使用其來進行建模和預測。