2010-10-02 82 views
6

我想在R中運行一個兩級probit最小二乘法迴歸法。有誰知道如何做到這一點?那裏有包嗎?我知道使用Stata可以做到這一點,所以我想可以用R來做。R中的兩級最小二乘法

回答

8

當你說'兩階段概率最小二乘'時,你可能想要更具體一些。既然你提到了一個實現這個的Stata程序,我猜你正在談論CDSIMEQ包,這個包實現了Heckit模型(又稱通用Tobit,又名Tobit類型II等)的Amemiya(1978)程序。正如Grant所說,systemfit會爲你做一個Tobit,但不會有兩個方程式。 MicEcon軟件包確實有一個Heckit(但是這個軟件包已經分裂了很多次,我不知道它現在在哪裏)。

如果你想CDSIMEQ做什麼,它可以很容易地在河中實現我寫的複製CDSIMEQ功能:

tspls <- function(formula1, formula2, data) { 
    # The Continous model 
    mf1 <- model.frame(formula1, data) 
    y1 <- model.response(mf1) 
    x1 <- model.matrix(attr(mf1, "terms"), mf1) 

    # The dicontionous model 
    mf2 <- model.frame(formula2, data) 
    y2 <- model.response(mf2) 
    x2 <- model.matrix(attr(mf2, "terms"), mf2) 

    # The matrix of all the exogenous variables 
    X <- cbind(x1, x2) 
    X <- X[, unique(colnames(X))] 

    J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1)) 
    J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2)) 
    for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1 
    for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1 

    # Step 1: 
    cat("\n\tNOW THE FIRST STAGE REGRESSION") 
    m1 <- lm(y1 ~ X - 1) 
    m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit")) 
    print(summary(m1)) 
    print(summary(m2)) 

    yhat1 <- m1$fitted.values 
    yhat2 <- X %*% coef(m2) 

    PI1 <- m1$coefficients 
    PI2 <- m2$coefficients 
    V0 <- vcov(m2) 
    sigma1sq <- sum(m1$residuals^2)/m1$df.residual 
    sigma12 <- 1/length(y2) * sum(y2 * m1$residuals/dnorm(yhat2)) 

    # Step 2: 
    cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS") 

    m1 <- lm(y1 ~ yhat2 + x1 - 1) 
    m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 
    sm1 <- summary(m1) 
    sm2 <- summary(m2) 
    print(sm1) 
    print(sm2) 

    # Step 3: 
    cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n") 
    gamma1 <- m1$coefficients[1] 
    gamma2 <- m2$coefficients[1] 

    cc <- sigma1sq - 2 * gamma1 * sigma12 
    dd <- gamma2^2 * sigma1sq - 2 * gamma2 * sigma12 
    H <- cbind(PI2, J1) 
    G <- cbind(PI1, J2) 

    XX <- crossprod(X)       # X'X 
    HXXH <- solve(t(H) %*% XX %*% H)   # (H'X'XH)^(-1) 
    HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH 
    Valpha1 <- cc * HXXH + gamma1^2 * HXXH %*% HXXVXXH %*% HXXH 

    GV <- t(G) %*% solve(V0) # G'V0^(-1) 
    GVG <- solve(GV %*% G)  # (G'V0^(-1)G)^(-1) 
    Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG 

    ans1 <- coef(sm1) 
    ans2 <- coef(sm2) 

    ans1[,2] <- sqrt(diag(Valpha1)) 
    ans2[,2] <- sqrt(diag(Valpha2)) 
    ans1[,3] <- ans1[,1]/ans1[,2] 
    ans2[,3] <- ans2[,1]/ans2[,2] 
    ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE) 
    ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE) 

    cat("Continuous:\n") 
    print(ans1) 
    cat("Dichotomous:\n") 
    print(ans2) 
} 

爲了便於比較,我們可以從CDSIMEQ在筆者複製其樣本article about the package

> library(foreign) 
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta") 
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4, 
+  dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7, 
+  data = cdsimeq) 

     NOW THE FIRST STAGE REGRESSION 
Call: 
lm(formula = y1 ~ X - 1) 

Residuals: 
     Min  1Q Median  3Q  Max 
-1.885921 -0.438579 -0.006262 0.432156 2.133738 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
X(Intercept) 0.010752 0.020620 0.521 0.602187  
Xexog3  0.158469 0.021862 7.249 8.46e-13 *** 
Xexog2  -0.009669 0.021666 -0.446 0.655488  
Xexog1  0.159955 0.021260 7.524 1.19e-13 *** 
Xexog4  0.316575 0.022456 14.097 < 2e-16 *** 
Xexog5  0.497207 0.021356 23.282 < 2e-16 *** 
Xexog6  -0.078017 0.021755 -3.586 0.000352 *** 
Xexog7  0.161177 0.022103 7.292 6.23e-13 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6488 on 992 degrees of freedom 
Multiple R-squared: 0.5972,  Adjusted R-squared: 0.594 
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ X - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49531 -0.59244 0.01983 0.59708 2.41810 

Coefficients: 
      Estimate Std. Error z value Pr(>|z|)  
X(Intercept) 0.08352 0.05280 1.582 0.113692  
Xexog3  0.21345 0.05678 3.759 0.000170 *** 
Xexog2  0.21131 0.05471 3.862 0.000112 *** 
Xexog1  0.45591 0.06023 7.570 3.75e-14 *** 
Xexog4  0.39031 0.06173 6.322 2.57e-10 *** 
Xexog5  0.75955 0.06427 11.818 < 2e-16 *** 
Xexog6  0.85461 0.06831 12.510 < 2e-16 *** 
Xexog7  -0.16691 0.05653 -2.953 0.003152 ** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.14 on 992 degrees of freedom 
AIC: 770.14 

Number of Fisher Scoring iterations: 6 


     NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS 
Call: 
lm(formula = y1 ~ yhat2 + x1 - 1) 

Residuals: 
    Min  1Q Median  3Q  Max 
-2.32152 -0.53160 0.04886 0.53502 2.44818 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
yhat2   0.257592 0.021451 12.009 <2e-16 *** 
x1(Intercept) 0.012185 0.024809 0.491 0.623  
x1exog3  0.042520 0.026735 1.590 0.112  
x1exog2  0.011854 0.026723 0.444 0.657  
x1exog1  0.007773 0.028217 0.275 0.783  
x1exog4  0.318636 0.028311 11.255 <2e-16 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7803 on 994 degrees of freedom 
Multiple R-squared: 0.4163,  Adjusted R-squared: 0.4128 
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49610 -0.58595 0.01969 0.59857 2.41281 

Coefficients: 
       Estimate Std. Error z value Pr(>|z|)  
yhat1   1.26287 0.16061 7.863 3.75e-15 *** 
x2(Intercept) 0.07080 0.05276 1.342 0.179654  
x2exog1  0.25093 0.06466 3.880 0.000104 *** 
x2exog2  0.22604 0.05389 4.194 2.74e-05 *** 
x2exog5  0.12912 0.09510 1.358 0.174544  
x2exog6  0.95609 0.07172 13.331 < 2e-16 *** 
x2exog7  -0.37128 0.06759 -5.493 3.94e-08 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.21 on 993 degrees of freedom 
AIC: 768.21 

Number of Fisher Scoring iterations: 6 

     NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS 

Continuous: 
       Estimate Std. Error t value Pr(>|t|) 
yhat2   0.25759209 0.1043073 2.46955009 0.01369540 
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445 
x1exog3  0.04252006 0.1291588 0.32920764 0.74206810 
x1exog2  0.01185438 0.1290754 0.09184073 0.92684309 
x1exog1  0.00777347 0.1363643 0.05700519 0.95455252 
x1exog4  0.31863627 0.1367881 2.32941597 0.02003661 
Dichotomous: 
       Estimate Std. Error z value  Pr(>|z|) 
yhat1   1.26286574 0.7395166 1.7076909 0.0876937093 
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867 
x2exog1  0.25092561 0.3126763 0.8025092 0.4222584495 
x2exog2  0.22603717 0.2739307 0.8251618 0.4092797527 
x2exog5  0.12911922 0.4822986 0.2677163 0.7889176766 
x2exog6  0.95609385 0.2823662 3.3860070 0.0007091758 
x2exog7  -0.37128221 0.3265478 -1.1369920 0.2555416141 
2

在R中有幾個包可用來做兩個狀態最小二乘。這裏有幾個

  1. SEM:Two-Stage Least Squares
  2. 澤裏格柯:鏈接刪除,不再功能(11年7月28日)

讓我知道如果這些滿足您的目的。

+0

謝謝,我會試試看。這似乎符合我的需求。 – 2010-10-02 01:59:46

2

systemfit也會伎倆。