2014-06-25 34 views
0

下列數據:相同的二次和三次預測

require(dplyr) 
require(ggplot2) 

ds <- read.table(header = TRUE, text =" 
obs id year attend 
       1 47 2000  1 
       2 47 2001  3 
       3 47 2002  5 
       4 47 2003  8 
       5 47 2004  6 
       6 47 2005  4 
       7 47 2006  2 
       8 47 2007  1 
       9 47 2008  2 
       10 47 2009  3 
       11 47 2010  4 
       12 47 2011  5 

       ") 
print(ds) 

我想計算線性模型的預測值

linear<- predict(lm(attend ~ year, ds)) 
quadratic<- predict(lm(attend ~ year + I(year^2),ds)) 
cubic<-  predict(lm(attend ~ year + I(year^2) + I(year^3),ds)) 

ds<- ds %>% dplyr::mutate(linear=linear, quadratic=quadratic, cubic=cubic) 
print(ds) 

    obs id year attend linear quadratic cubic 
1 1 47 2000  1 3.820513 3.500000 3.500000 
2 2 47 2001  3 3.792541 3.646853 3.646853 
3 3 47 2002  5 3.764569 3.758741 3.758741 
4 4 47 2003  8 3.736597 3.835664 3.835664 
5 5 47 2004  6 3.708625 3.877622 3.877622 
6 6 47 2005  4 3.680653 3.884615 3.884615 
7 7 47 2006  2 3.652681 3.856643 3.856643 
8 8 47 2007  1 3.624709 3.793706 3.793706 
9 9 47 2008  2 3.596737 3.695804 3.695804 
10 10 47 2009  3 3.568765 3.562937 3.562937 
11 11 47 2010  4 3.540793 3.395105 3.395105 
12 12 47 2011  5 3.512821 3.192308 3.192308 

問題:儘管時間序列具有明顯的立方體形狀,二次和三次預測是相同的。爲什麼?這是一個錯誤嗎?

+1

然而,這可能是防止因使用'聚()'另一個錯誤。當他們纔會知道。 –

回答

6

這是由於2011^3是一個非常大的數字(更大的塔,這是造成coeffiicent爲NA被退回。如果你已檢查的車型,你會注意到這一點。

coef(lm(attend ~ year + I(year^2) + I(year^3),ds)) 
# (Intercept)   year  I(year^2)  I(year^3) 
# -7.025524e+04 7.009441e+01 -1.748252e-02   NA 

更明智的使用poly創建正交多項式

linear<- predict(lm(attend ~ year, ds)) 
quadratic<- predict(lm(attend ~ poly(year,2),ds)) 
cubic<-  predict(lm(attend ~ poly(year,3),ds)) 


ds<- (ds %>% dplyr::mutate(linear=linear, quadratic=quadratic, cubic=cubic)) 
ds 
# obs id year attend linear quadratic  cubic 
# 1 1 47 2000  1 3.820513 3.500000 0.7435897 
# 2 2 47 2001  3 3.792541 3.646853 3.8974359 
# 3 3 47 2002  5 3.764569 3.758741 5.5128205 
# 4 4 47 2003  8 3.736597 3.835664 5.9238539 
# 5 5 47 2004  6 3.708625 3.877622 5.4646465 
# 6 6 47 2005  4 3.680653 3.884615 4.4693085 
# 7 7 47 2006  2 3.652681 3.856643 3.2719503 
# 8 8 47 2007  1 3.624709 3.793706 2.2066822 
# 9 9 47 2008  2 3.596737 3.695804 1.6076146 
# 10 10 47 2009  3 3.568765 3.562937 1.8088578 
# 11 11 47 2010  4 3.540793 3.395105 3.1445221 
# 12 12 47 2011  5 3.512821 3.192308 5.9487179 
+0

這是非常合情合理的。我做忘記集中,謝謝! – andrey

相關問題