2016-09-19 70 views
4

我有從glm中擬合的係數,我想預測一組新數據的期望值。如果我有模型對象,使用predict()會很簡單。但是,我現在不在現場,出於數據保密原因,我不再擁有模型對象。我只有摘要對象,使用包含模型係數的摘要(模型)生成。如何根據沒有模型對象的ns樣條參數進行預測

使用係數來預測簡單模型的期望值是很容易的。但是,我想知道如何在模型包含三次樣條ns()時執行此操作。當模型也包含分類變量時,任何捷徑都會被讚賞。

這是一個簡單的例子。

library(splines) 
dat <- data.frame(x=1:500, z=runif(500), k=as.factor(sample(c("a","b"), size=500, replace=TRUE))) 
kvals <- data.frame(kn=c("a","b"),kv=c(20,30)) 
dat$y = dat$x + (40*dat$z)^2 + kvals$kv[match(dat$k,kvals$kn)] + rnorm(500,0,30) 
# Fit model 
library(splines) 
mod <- glm(y ~ x + ns(z,df=2) + k,data=dat) 
# Create new dataset 
dat.new <- expand.grid(x=1:3,z=seq(0.2,0.4,0.1),k="b") 
# Predict expected values in the usual way 
predict(mod,newdata=dat.new) 
summ <- summary(mod) 
rm(mod) 
# Now, how do I predict using just the summary object and dat.new? 
+0

http://stats.stackexchange.com/a/101484/11849 – Roland

+0

如果您使用的功能htat不在基本集中,您應該始終添加所需的庫調用。 –

回答

1

有可能解決這一更有效的方法,但這裏有一個起點,讓你建立實施羅蘭簡要建議的策略。該summ對象有定義樣條函數所需的信息,但它是一種葬

names(summ) 
    [1] "call"   "terms"   "family"   "deviance"  "aic"   
    [6] "contrasts"  "df.residual" "null.deviance" "df.null"  "iter"   
    [11] "deviance.resid" "coefficients" "aliased"  "dispersion"  "df"    
    [16] "cov.unscaled" "cov.scaled"  

而綜觀目前terms葉的結構,我們看到花細節埋設較深仍在predvars subleaf內:

str(summ$terms) 
Classes 'terms', 'formula' language y ~ x + ns(z, df = 2) + k 
    ..- attr(*, "variables")= language list(y, x, ns(z, df = 2), k) 
    ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ... 
    .. ..- attr(*, "dimnames")=List of 2 
    .. .. ..$ : chr [1:4] "y" "x" "ns(z, df = 2)" "k" 
    .. .. ..$ : chr [1:3] "x" "ns(z, df = 2)" "k" 
    ..- attr(*, "term.labels")= chr [1:3] "x" "ns(z, df = 2)" "k" 
    ..- attr(*, "order")= int [1:3] 1 1 1 
    ..- attr(*, "intercept")= int 1 
    ..- attr(*, "response")= int 1 
    ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
    ..- attr(*, "predvars")= language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"), Boundary.knots = c(0.00118412892334163, 0.99828373757191), intercept = FALSE), k) 
    ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "nmatrix.2" "factor" 
    .. ..- attr(*, "names")= chr [1:4] "y" "x" "ns(z, df = 2)" "k" 

所以拉出來的屬性:

str(attributes(summ$terms)$predvars) 
language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"), 
       Boundary.knots = c(0.00118412892334163, 0.99828373757191), intercept = FALSE), k) 

你可以看到它是PO ssible如果你提供所需的X,Y,Z,和k值恢復花:

with(dat, ns(z, knots = 0.514993450604379, Boundary.knots = c(0.00118412892334163, 
0.99828373757191), intercept = FALSE)) 
#--- 
       1    2 
    [1,] 5.760419e-01 -1.752762e-01 
    [2,] 2.467001e-01 -1.598936e-01 
    [3,] 4.392684e-01 4.799757e-01 
snipping .... 
[498,] 4.965628e-01 -2.576437e-01 
[499,] 5.627389e-01 1.738909e-02 
[500,] 2.393920e-02 -1.611872e-02 
attr(,"degree") 
[1] 3 
attr(,"knots") 
[1] 0.5149935 
attr(,"Boundary.knots") 
[1] 0.001184129 0.998283738 
attr(,"intercept") 
[1] FALSE 
attr(,"class") 
[1] "ns"  "basis" "matrix" 

您可以構建一個替代dat,如果你知道你的數據的極端。請參閱?ns及其鏈接的其他幫助頁面。

相關問題