2017-06-23 108 views
1

我需要手動編程概率迴歸模型而不使用glm。我會用optim直接最小化負對數似然。用optim()估計概率迴歸模型

我寫了下面的代碼,但它不工作,給錯誤:

cannot coerce type 'closure' to vector of type 'double'

# load data: data provided via the bottom link 
Datospregunta2a <- read.dta("problema2_1.dta") 
attach(Datospregunta2a) 

# model matrix `X` and response `Y` 
X <- cbind(1, associate_professor, full_professor, emeritus_professor, other_rank) 
Y <- volunteer 

# number of regression coefficients 
K <- ncol(X) 

# initial guess on coefficients 
vi <- lm(volunteer ~ associate_professor, full_professor, emeritus_professor, other_rank)$coefficients 

# negative log-likelihood 
probit.nll <- function (beta) { 
    exb <- exp(X%*%beta) 
    prob<- rnorm(exb) 
    logexb <- log(prob) 
    y0 <- (1-y) 
    logexb0 <- log(1-prob) 
    yt <- t(y) 
    y0t <- t(y0) 
    -sum(yt%*%logexb + y0t%*%logexb0) 
    } 

# gradient 
probit.gr <- function (beta) { 
    grad <- numeric(K) 
    exb <- exp(X%*%beta) 
    prob <- rnorm(exb) 
    for (k in 1:K) grad[k] <- sum(X[,k]*(y - prob)) 
    return(-grad) 
    } 

# direct minimization 
fit <- optim(vi, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE) 

數據:https://drive.google.com/file/d/0B06Id6VJyeb5OTFjbHVHUE42THc/view?usp=sharing

+2

只要我看到'read.dta(「problema2_1.dta」)'我懷疑你需要拼命閱讀[MCVE] –

+0

感謝您的評論,我是一個noob使用r,我用pnorm,改變y Y並添加「+」,該程序起作用! –

回答

0

區分大小寫

Yy是不同的。所以你應該在你定義的函數probit.nllprobit.gr中使用Y而不是y

這兩個函數對我來說也不正確。最明顯的問題是rnorm的存在。以下是正確的。

負對數似然函數

# requires model matrix `X` and binary response `Y` 
probit.nll <- function (beta) { 
    # linear predictor 
    eta <- X %*% beta 
    # probability 
    p <- pnorm(eta) 
    # negative log-likelihood 
    -sum((1 - Y) * log(1 - p) + Y * log(p)) 
    } 

梯度函數

# requires model matrix `X` and binary response `Y` 
probit.gr <- function (beta) { 
    # linear predictor 
    eta <- X %*% beta 
    # probability 
    p <- pnorm(eta) 
    # chain rule 
    u <- dnorm(eta) * (Y - p)/(p * (1 - p)) 
    # gradient 
    -crossprod(X, u) 
    } 

初始參數從lm()

這並不聽起來像一個值合理的想法。我們絕不應該對二進制數據應用線性迴歸。

但是,純粹專注於使用lm,您需要+而不是,來分開公式的右側的協變量。


重複的例子,

讓我們產生一個玩具數據集

set.seed(0) 
# model matrix 
X <- cbind(1, matrix(runif(300, -2, 1), 100)) 
# coefficients 
b <- runif(4) 
# response 
Y <- rbinom(100, 1, pnorm(X %*% b)) 

# `glm` estimate 
GLM <- glm(Y ~ X - 1, family = binomial(link = "probit")) 

# our own estimation via `optim` 
# I am using `b` as initial parameter values (being lazy) 
fit <- optim(b, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE) 

# comparison 
unname(coef(GLM)) 
# 0.62183195 0.38971121 0.06321124 0.44199523 

fit$par 
# 0.62183540 0.38971287 0.06321318 0.44199659 

他們都非常接近對方!

+0

謝謝,你有一個想法如何編程一個邊緣效應,而不使用mfx? –