R - 在data.table中使用glm

我正在嘗試在data.table中執行一些glm以生成按關鍵因素分割的模擬結果。R - 在data.table中使用glm

我已經做成功地爲：

高層GLM

GLM（modellingDF，公式=結局〜IntCol + DecCol，家族=二項式（鏈接= Logit模型））
作用域GLM單柱

modellingDF [，列表（成果，嵌合= GLM（X，式=成果〜IntCol，家族=二項式（LI通過=可變NK =分對數））$嵌合）， ]
作用域GLM具有兩個整數列

modellingDF [，列表（成果，嵌合= GLM（X，式=成果〜IntCol + IntCol2 ，家族=二項式通過=變量（鏈接= Logit模型））$裝）， ]

但是，當我嘗試做範圍內的高級別GLM我decimal列，它產生這個錯誤

Error in model.frame.default(formula = Outcome ~ IntCol + DecCol, data = x, : 
    variable lengths differ (found for 'DecCol')

我想也許這是由於分區的長度可變，所以我有一個重複的例子測試：

library("data.table") 

testing<-data.table(letters=sample(rep(LETTERS,5000),5000), 
        letters2=sample(rep(LETTERS[1:5],10000),5000), 
        cont.var=rnorm(5000), 
        cont.var2=round(rnorm(5000)*1000,0), 
        outcome=rbinom(5000,1,0.8) 
        ,key="letters") 
testing.glm<-testing[,list(outcome, 
        fitted=glm(x,formula=outcome~cont.var+cont.var2,family=binomial(link=logit))$fitted) 
     ),by=list(letters)]

但這並沒有錯誤。我想也許這是由於NAS或東西，但是data.table modellingDF的總結給出指示，應該有任何問題：

DecCol 
Min. :0.0416 
1st Qu.:0.6122 
Median :0.7220 
Mean :0.6794 
3rd Qu.:0.7840 
Max. :0.9495 

nrow(modellingDF[is.na(DecCol),]) # results in 0 

modellingDF[,list(len=.N,DecCollen=length(DecCol),IntCollen=length 
(IntCol),Outcomelen=length(Outcome)),by=Bracket] 

    Bracket len DecCollen IntCollen Outcomelen 
1:  3-6 39184 39184  39184  39184 
2:  1-2 19909 19909  19909  19909 
3:  0 9912 9912  9912  9912

也許我有一天想睡，但任何人都可以提出一個解決方案還是進一步挖掘這個問題的方法？

來源

2013-09-25 Steph Locke

NAs？ [R變量長度差異當建立線性模型的殘差]（http://stackoverflow.com/questions/14924541/r-variable-length-differ-when-build-linear-model-for-residuals） – zx8754

我認爲它，但是對於每一列，'sapply（modellingDF，function（x）all（is.na（x）））'返回FALSE –

您是否可以製作一個可重複生成錯誤的示例？你已經證明了這個錯誤是好的，但不是產生它的錯誤，iiuc。 –

您需要在glm內正確指定data參數。在data.table（使用[）內部，參考.SD。（見create a formula in a data.table environment in R相關問題）

所以

modellingDF[,list(Outcome, fitted = glm(data = .SD, 
    formula = Outcome ~ IntCol ,family = binomial(link = logit))$fitted), 
by=variable]

會工作。

雖然在此情況下（簡單地提取擬合值和移動的），這種做法是合理的，使用data.table和.SD可以在環境中的一個爛攤子得到，如果要保存的整個模型，然後試圖update（見Why is using update on a lm inside a grouped data.table losing its model data?）

來源

2013-09-25 11:43:50 mnel

這個答案有點過時。變量]'應該工作，並且更乾淨。 – MichaelChirico

R - 在data.table中使用glm

回答

相關問題