2016-04-30 34 views
0

問題有關R,GLM()函數:中的R使用在GLM指定迴歸量()

我已經獲得的數據集作爲:

mydata <- read.csv("data.csv", header = TRUE) 

它包含變量 'Y'(y是二進制0或1)和60個迴歸器。其中三個迴歸者是'平均值','年齡'和'收入'(三者都是數字)。

我想用GLM功能迴歸,如下圖所示:

model <-glm(y~., data = mydata, family = binomial) 

你能告訴我,我怎麼可能會繼續,如果我不想使用三個指定的變量(平均,年齡和收入)在glm()函數中,並且只使用其餘的57個變量?

回答

0

在運行glm()之前,可以簡單地從mydata中排除這三個變量。

在這裏,我創建一些示例數據:

set.seed(1) 
mydata<-replicate(10,rnorm(100,300,50)) 
mydata<-data.frame(dv=sample(c(0,1),100,replace = TRUE),mydata) 

> head(mydata) 
    dv  X1  X2  X3  X4  X5  X6  X7  X8  X9  X10 
1 1 268.6773 268.9817 320.4701 344.6837 353.7220 303.8652 282.9467 264.6216 245.6546 222.9299 
2 1 309.1822 302.1058 384.4437 247.6351 394.7827 285.1566 375.1212 398.5786 208.6958 309.7161 
3 1 258.2186 254.4539 379.3294 398.5669 269.8501 240.8379 326.4154 295.5001 349.7641 313.2211 
4 0 379.7640 307.9014 283.4546 280.8184 280.4566 300.5646 327.1096 299.2991 299.4069 244.0632 
5 0 316.4754 267.2708 185.7382 382.7073 279.1889 349.5801 293.1663 243.8272 270.0186 332.5476 
6 0 258.9766 388.3644 424.8831 375.6106 281.2171 379.6984 243.1633 232.7935 291.1026 248.3550 

如果我運行上的數據,指定的模型,因爲它是那麼我用右手側的所有變量:

model<-glm(data=mydata, dv~.,family=binomial(link = 'logit')) 

> summary(model) 

Call: 
glm(formula = dv ~ ., family = binomial(link = "logit"), data = mydata) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-1.8891 -1.0853 -0.5163 1.0237 1.8303 

Coefficients: 
       Estimate Std. Error z value Pr(>|z|) 
(Intercept) -2.4330825 4.1437180 -0.587 0.5571 
X1   -0.0020482 0.0049025 -0.418 0.6761 
X2   -0.0059021 0.0046298 -1.275 0.2024 
X3   0..0047991 2.568 0.0102 * 
X4   0.0024804 0.0046856 0.529 0.5966 
X5   0.0025348 0.0039545 0.641 0.5215 
X6   -0.0005905 0.0047417 -0.125 0.9009 
X7   -0.0001758 0.0040737 -0.043 0.9656 
X8   0.0042362 0.0041170 1.029 0.3035 
X9   -0.0007664 0.0042471 -0.180 0.8568 
X10   -0.0042089 0.0043094 -0.977 0.3287 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 138.59 on 99 degrees of freedom 
Residual deviance: 125.11 on 89 degrees of freedom 
AIC: 147.11 

Number of Fisher Scoring iterations: 4 

現在我排除MYDATA X1和X2,然後再次運行模式:

mydata2<-mydata[,-match(c('X1','X2'), colnames(mydata))] 

model2<-glm(data=mydata2, dv~.,family=binomial(link = 'logit')) 
> summary(model2) 

Call: 
glm(formula = dv ~ ., family = binomial(link = "logit"), data = mydata2) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-1.8983 -1.0724 -0.4521 1.1132 1.7792 

Coefficients: 
       Estimate Std. Error z value Pr(>|z|) 
(Intercept) -4.8725545 3.6357314 -1.340 0.18019 
X3   0.0124982 0.0047930 2.608 0.00912 ** 
X4   0.0031911 0.0045971 0.694 0.48758 
X5   0.0015992 0.0038101 0.420 0.67467 
X6   -0.0003295 0.0046554 -0.071 0.94357 
X7   0.0003372 0.0039961 0.084 0.93275 
X8   0.0038889 0.0040737 0.955 0.33977 
X9   -0.0010014 0.0042078 -0.238 0.81189 
X10   -0.0041691 0.0042232 -0.987 0.32356 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 138.59 on 99 degrees of freedom 
Residual deviance: 126.93 on 91 degrees of freedom 
AIC: 144.93 

Number of Fisher Scoring iterations: 4 
0

.( 「一切」)O n公式的右側可以通過減去項來修改:

model <- glm(y~ . - avg - age - income, data = mydata, 
    family = binomial)