2016-02-26 105 views
0

我有一個結果數據,Y和10個預測變量(X1-X10)。循環通過R中的變量

set.seed(1001) 
n <- 100 
Y < c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) 
X1 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.4,0.5)) 
X2 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.5,0.25,0.25)) 
X3 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.3,0.4,0.4)) 
X4 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3)) 
X5 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.2,0.7)) 
X6 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.8,0.1,0.1)) 
X7 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.1,0.1,0.8)) 
X8 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3)) 
X9 <- sample(x=c(0,1,2), size=n, replace=TRUE, prob=c(0.35,0.35,0.3)) 
X10 <- c(0,2,2,2,2,2,2,2,0,2,0,2,2,0,0,0,0,0,2,0,0,2,2,0,0,2,2,2,0,2,0,2,0,2,1,2,1,1,1,1,1,1,1,1,1,1,1,0,1,2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0) 

datasim <- data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10) 

我的目標是擬合每個預測變量的邏輯斯蒂模型並計算偏差差異(d偏差)。後來引導dDeviance 1000次(R = 1000)。我嘗試了以下函數,它一次適用於一個變量。你能否建議我如何增強代碼,使其通過變量1到10循環,計算d偏差並稍後自舉值。

glmfunction <- function(data,indices) 
{ 
glm.snp1 <- glm(Y~X1, family="binomial", data=data[indices,]) 
null <- glm.snp1$null.deviance 
residual <- glm.snp1$deviance 
dDeviance <-(null-residual) 
return(dDeviance) 
} 

result <- boot(datasim,glmfunction, R=1000) 

回答

3

有可能有很多方法來解決這個問題,但這裏是我如何做到這一點。我首先創建的獨立變量我想在我的模型使用的載體:

#vector of independent variables 
iv <- grep("X",colnames(datasim), value=T) 

然後我遍歷他們擬合模型並提取dDeviance。這確保了我的引導函數不會返回一個值,而是一個長度矢量(獨立變量的數量)。

glmfunction <- function(data,indices, iv){ 
    res <- sapply(iv, function(x){ 
    fit <- glm(formula=sprintf("Y~%s",x), family="binomial", data=data[indices,]) 
    #deviance 
    dDeviance <- with(fit, null.deviance - deviance) 
    return(dDeviance) 
    }) 
    res 
} 

我選擇把iv引導功能的一個正式的說法,所以你必須指定它,不意外的範圍界定,問題的運行,靈活性和易於調試。然後你可以運行你的引導程序:

result <- boot(datasim,glmfunction, iv = iv, R=10) 
+0

非常感謝@Heroka。它工作得很好。 – Shima

+0

不客氣!我很高興看到你已經完美地解決了你以前的問題(帶有索引/奇怪的結果)。 – Heroka