2017-04-04 93 views
0

MVE: 讓這成爲數據集:自動化迴歸與特定的因變量和自變量

data <- data.frame(year = rep(seq(1966,2015,1), 8), 
       county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50), 
          rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)), 
       crime1 = runif(400), crime2 = runif(400), crime3 = runif(400), 
       uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400), 
       var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400)) 

假設crime1,2和3是具體的因變量。 uvar1,2和3是特定的自變量。 var1,2等是其他協變量。我想要做的是自動化迴歸。

也就是說,我要得到這個代碼的結果:

plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data) 

plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data) 

等;但沒有爲每個估計模型編寫20行代碼。

通過尋找類似的問題,這是因爲據我會來:

crime <- c('crime1', 'crime2', 'crime3') 
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4, 
                model = 'within', effect ='twoways', data = data)) 

這肯定有助於我的因變量,但我想不出如何將在這些估計的特定的獨立變量。爲了澄清一次,我希望univar1在第一次迴歸中,但不在其餘部分中。

回答

0

formula功能在創建多組模型時很有用。您可以納入變化 使用paste0組合formulalapply遍歷指數1至3

#remember to set.seed when sampling from distributions 

set.seed(123) 

#a helper function to create "log(var)" from "var" 
fn_appendLog = function(x) { 
paste0("log(",x,")") 
} 



modelList = lapply(1:3,function(x) { 


indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog)) 

#> indepVars2 
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)" 


indepVars1 = fn_appendLog(paste0("uvar",x)) 

depVar = fn_appendLog(paste0("crime",x)) 

formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2)) 

#> formulaVar 
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) + log(var4) + log(var5) 


modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF) 


}) 

摘要:

summary(modelList[[1]]) 

#> summary(modelList[[1]]) 
#Twoways effects Within Model 
# 
#Call: 
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within") 
# 
#Balanced Panel: n=50, T=8, N=400 
# 
#Residuals : 
# Min. 1st Qu. Median 3rd Qu. Max. 
# -5.730 -0.396 0.116 0.599 1.520 
# 
#Coefficients : 
#    Estimate Std. Error t-value Pr(>|t|) 
#log(uvar1) 0.0393871 0.0490891 0.8024 0.4229 
#log(var1) -0.0369356 0.0541029 -0.6827 0.4953 
#log(var2) -0.0455269 0.0543664 -0.8374 0.4030 
#log(var3) 0.0150516 0.0520347 0.2893 0.7726 
#log(var4) -0.0034534 0.0441506 -0.0782 0.9377 
#log(var5) -0.0109038 0.0527446 -0.2067 0.8363 
# 
#Total Sum of Squares: 302.23 
#Residual Sum of Squares: 300.6 
#R-Squared:  0.0053896 
#Adj. R-Squared: 0.0045407 
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448 

說明:

獨立變量有兩種類型,第一種是uvar1和其他var1...varN

1)colnames(regDF)[grepl("^v",colnames(regDF))]這會給我們所有變量的 列表中regDF開始以字母「V」與 字符串和$插入符號符號標誌着開始爲字符串的結束,輸出其匹配模式在這個階段是c("var1","var2"...,"var5")

2)我們需要登錄這個變量矢量的變體,因此我們將它們傳遞通過lapply給函數 fn_appendLog,這導致的list("log(var1)","log(var2)",...,"log(var5)")

3)接着列表輸出,就需要這些v ariables轉化爲log(var1)+log(var2)...+log(var5)

4)要做到這一點,我們使用功能Reduce與功能paste(x,y,sep="+"),這需要 與相鄰的元件上面的列表中的每個元素,並與分隔符一起加入作爲「+」

step1 = (log(var1)+log(var2)) 
    step2 = (log(var1)+log(var2)) + log(var3) 
    step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on 

5)功能Reduce將該函數應用於列表並聚集輸出到所得的log(var1)+log(var2)+log(var3)+log(var4)+log(var5)

最終輸出的單個載體 這似乎在冷杉恐嚇但你經常使用它們,並探索他們的例子 將你的部分曲目在任何時間。瞭解一個函數的最佳方式是說lapply是閱讀文檔的端到端?lapply和執行 列出的例子,修改參數並獲得熟悉。希望這對你的查詢減少了一些光線 。

+0

正是我在找的東西。非常感謝! – Astronaut

+0

儘管它的功能完美,但我很想知道你在這裏做了什麼,而且我在這個部分做了很多努力:indepVars2 = Reduce(函數(x,y)paste(x,y,sep =「+」), lapply (colnames(data)[grepl(「^ v」,colnames(data))],fn_appendLog))請您詳細說明這部分究竟做了什麼? – Astronaut

+0

我已經添加了一些涉及'Reduce'和'lapply'的步驟的解釋,讓我知道這是否足夠。 – OdeToMyFiddle