2015-11-25 80 views
2

我正嘗試從STATA複製GLM估計:的Logit二項迴歸與集羣標準誤差

sysuse auto 
logit foreign weight mpg, cluster(rep78) 


Logistic regression        Number of obs =   69 
                Wald chi2(2) =  31.57 
                Prob > chi2  =  0.0000 
Log pseudolikelihood = -22.677963     Pseudo R2  =  0.4652 

            (Std. Err. adjusted for 5 clusters in rep78) 
------------------------------------------------------------------------------ 
     |    Robust 
foreign |  Coef. Std. Err.  z P>|z|  [95% Conf. Interval] 
-------------+---------------------------------------------------------------- 
    weight | -.0042281 .0008022 -5.27 0.000 -.0058004 -.0026558 
    mpg | -.1524015 .0271285 -5.62 0.000 -.2055724 -.0992306 
    _cons | 14.21689 2.031826  7.00 0.000  10.23458 18.19919 

我已經試過我已經在網上找到不同的答案:

data <- read.dta("http://www.stata-press.com/data/r9/auto.dta") 
data <- data[,c("foreign", "weight", "mpg", "rep78")] 
data <- na.omit(data) 
logit.model <- glm(foreign ~ weight + mpg , data, family = binomial(logit)) 
clx <- function(fm, dfcw, cluster){ 
    library(sandwich);library(lmtest) 
    M <- length(unique(cluster)) 
    N <- length(cluster)   
    K <- fm$rank       
    dfc <- (M/(M-1))*((N-1)/(N-K)) 
    uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum)); 
    vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw 
    coeftest(fm, vcovCL) 
} 
clx(logit.model, 1, data$rep78) 

z test of coefficients: 

       Estimate Std. Error z value Pr(>|z|)  
(Intercept) 14.21688944 2.06235711 6.8935 5.443e-12 *** 
weight  -0.00422810 0.00081428 -5.1924 2.076e-07 *** 
mpg   -0.15240148 0.02753611 -5.5346 3.119e-08 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

或:

logit.model <- lrm(foreign ~ weight + mpg, x=T, y=T, data=data) 
robcov(logit.model, cluster=data$rep78) 

      Coef S.E. Wald Z Pr(>|Z|) 
Intercept 14.2169 1.8173 7.82 <0.0001 
weight -0.0042 0.0007 -5.89 <0.0001 
mpg  -0.1524 0.0243 -6.28 <0.0001 

或:

logit.model <- glm(foreign ~ weight + mpg , data, family = binomial(logit)) 
G <- length(unique(data$rep78)) 
N <- length(data$rep78) 
dfa <- (G/(G - 1)) * (N - 1)/logit.model$df.residual 
c_vcov <- dfa * vcovHC(logit.model, type = "HC1", cluster = "group", adjust = T) 
coeftest(logit.model, vcov = c_vcov) 

z test of coefficients: 

       Estimate Std. Error z value Pr(>|z|)  
(Intercept) 14.2168894 5.0830412 2.7969 0.0051591 ** 
weight  -0.0042281 0.0010915 -3.8736 0.0001072 *** 
mpg   -0.1524015 0.1127248 -1.3520 0.1763823 

但是,沒有以前我得到完全相同的標準錯誤。我想知道我是否做錯了什麼,或者是否有另一個可以用來獲得與Stata完全相同的結果的軟件包。

+1

你能請張貼的結果你確實是得到了每一方法?一個[可重現的例子](http://tinyurl.com/reproducible-000)會更好... –

+0

謝謝,我只是發佈了一個可複製的例子的結果。 – user2246905

回答

3

我能夠做複製在Stata結果:

clrobustse <- function(fit.model, clusterid) { 
    rank=fit.model$rank 
    N.obs <- length(clusterid)    
    N.clust <- length(unique(clusterid)) 
    dfc <- N.clust/(N.clust-1)      
    vcv <- vcov(fit.model) 
    estfn <- estfun(fit.model) 
    uj <- apply(estfn, 2, function(x) tapply(x, clusterid, sum)) 
    N.VCV <- N.obs * vcv 
    ujuj.nobs <- crossprod(uj)/N.obs 
    vcovCL <- dfc*(1/N.obs * (N.VCV %*% ujuj.nobs %*% N.VCV)) 
    coeftest(fit.model, vcov=vcovCL) 
} 
clrobustse(logit.model, data$rep78)