2016-09-19 58 views
3

我正在研究迴歸問題,並嘗試使過程更加自動化。對於每個x變量,我都有一個我想測試的X變換矩陣(每列代表x變量的變換)。所以我需要創建一個循環,從每個X矩陣中取一個向量,對y變量進行測試並存儲每個變量的t值。使用變量轉換的多元迴歸自動化

我爲2個X變量做了工作,但需要您的幫助將其擴展到n個變量。代碼如下。

testvars <- function(y,X1,X2) { 

    Tvals_X1 = data.frame(matrix(0, ncol = ncol(X2), nrow = ncol(X1))) 
    Tvals_X2 = data.frame(matrix(0, ncol = ncol(X2), nrow = ncol(X1))) 

    for (i in 1:ncol(X1)) { 
    for (j in 1:ncol(X2)) { 
     temp <- lm(y ~ X1[,i] + X2[,j]) 
     Tvals_X1[i,j] <- summary(temp)$coefficients[2,3] 
     Tvals_X2[i,j] <- summary(temp)$coefficients[3,3] 
    } 
    } 
} 

回答

1

這是我的方法;

# example datas 
set.seed(1); y <- matrix(runif(20), ncol=1) 
set.seed(2); x1 <- matrix(runif(60), ncol=3) 
set.seed(3); x2 <- matrix(runif(80), ncol=4) 
set.seed(4); x3 <- matrix(runif(40), ncol=2) 
set.seed(5); x4 <- matrix(runif(60), ncol=3) 
我由具有COL-數
col.v <- sapply(list(x1,x2,x3,x4), ncol)   # ncols of each data 
col.comb <- expand.grid(sapply(col.v, seq.int)) # its all combinations 
# > head(col.comb, n=4) 
# Var1 Var2 Var3 Var4 
# 1 1 1 1 1 
# 2 2 1 1 1 
# 3 3 1 1 1 
# 4 1 2 1 1 
# 5 2 2 1 1 
我t.value通過 申請(col.comb,1,...)
tval <- apply(col.comb, 1, function(a) { 
    temp <- lm(y ~ x1[,a[1]] + x2[,a[2]] + x3[,a[3]] + x4[,a[4]]) 
    summary(temp)$coefficients[2:5, 3] }) 

# > head(tval, n=2)    # tval is matrix 
#  x1[, a[1]] x2[, a[2]] x3[, a[3]] x4[, a[4]] 
# [1,] -0.05692452 -0.9047370 -0.3758997 1.968530 
# [2,] 0.03476527 -0.9260632 -0.3740936 1.965884 
我所有組合的矩陣將tval-matrix的每一列改爲 array and combined each array納入 列表
results <- list()   # results[[1]] is x1's array 
for(i in seq.int(length(col.v))) results[[i]] <- array(tval[,i], dim=col.v) 
# names(results) <- c("x1", "x2", "x3", "x4") # if you want 

results2 <- array(t(tval), dim=c(length(col.v), col.v)) # all.array.version 
## results[[1]] is the same as results2[1,,,,] # both is x1's array 
    # dimnames(results2)[[1]] <- list("x1", "x2", "x3", "x4") # if you need 
檢查
c(results[[1]][2,3,2,3], results[[2]][2,3,2,3], results[[3]][2,3,2,3], results[[4]][2,3,2,3]) 
# [1] 0.54580342 -0.56418433 -0.02780492 -0.50140806 

c(results2[1,2,3,2,3], results2[2,2,3,2,3], results2[3,2,3,2,3], results2[4,2,3,2,3]) 
# [1] 0.54580342 -0.56418433 -0.02780492 -0.50140806 

summary(lm(y ~ x1[,2] + x2[,3] + x3[,2] + x4[,3]))$coefficients[2:5,3] 
# x1[, 2]  x2[, 3]  x3[, 2]  x4[, 3] 
# 0.54580342 -0.56418433 -0.02780492 -0.50140806 # no problem 
功能版本(N = 4);
testvars2 <- function(y, x1, x2, x3, x4){ 

    col.v <- sapply(list(x1,x2,x3,x4), ncol) 
    col.comb <- expand.grid(sapply(col.v, seq.int)) 

    tval <- t(apply(col.comb, 1, function(a) { 
    temp <- lm(y ~ x1[,a[1]] + x2[,a[2]] + x3[,a[3]] + x4[,a[4]]) 
    summary(temp)$coefficients[2:5, 3] })) 

    results <- list() 
    for(i in seq.int(length(col.v))) results[[i]] <- array(tval[,i], dim=col.v) 
    #results2 <- array(t(tval), dim=c(length(col.v), col.v)) 

    return(results) 
} 
+0

我的X矩陣有600多列,導致expand.grid出錯。你有什麼建議如何解決它? –

+0

@SevaGumeniuk;如果你想把結果作爲'array',它的'dim'變成'c(ncol(X1),ncol(X2),...,ncol(Xn))'。請嘗試'test_array < - array(1,dim = c(ncol(X1),ncol(X2),...,ncol(Xn)))''。如果R返回與錯誤相關的大小,則不可能將結果'數組'。 – cuttlefish44

0

既然這是StackOverflow而不是CrossValidated,那麼我將跳過有關這種變量選擇方法問題的警告。買者自負。

計算上,反覆調用lmglm會使R做相當多的簿記工作;相反,我會建議add1drop1函數。下面是示例中的示例輸出,它會將每個雙向交互添加到模型中。在你的情況中,由於每個預測變量使用1個自由度,所以F stat是t-stat平方。

> lm1 <- lm(Fertility ~ ., data = swiss) 
>  add1(lm1, ~ I(Education^2) + .^2, test='F') 
Single term additions 

Model: 
Fertility ~ Agriculture + Examination + Education + Catholic + 
    Infant.Mortality 
          Df Sum of Sq  RSS  AIC F value Pr(>F) 
<none>          2105.0429 190.69135     
I(Education^2)    1 11.818686 2093.2242 192.42672 0.22585 0.63721 
Agriculture:Examination  1 10.667353 2094.3756 192.45257 0.20373 0.65416 
Agriculture:Education   1 1.826563 2103.2164 192.65055 0.03474 0.85309 
Agriculture:Catholic   1 75.047836 2029.9951 190.98513 1.47878 0.23109 
Agriculture:Infant.Mortality 1 4.438027 2100.6049 192.59215 0.08451 0.77278 
Examination:Education   1 48.693777 2056.3492 191.59137 0.94719 0.33628 
Examination:Catholic   1 40.757983 2064.2850 191.77240 0.78977 0.37948 
Examination:Infant.Mortality 1 65.856710 2039.1862 191.19745 1.29182 0.26248 
Education:Catholic   1 278.189298 1826.8536 186.02953 6.09111 0.01796 * 
Education:Infant.Mortality 1 92.950398 2012.0925 190.56880 1.84784 0.18165 
Catholic:Infant.Mortality  1 2.358769 2102.6842 192.63865 0.04487 0.83332 
--- 
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1