2017-09-24 59 views
1

我有一個數據集,其中包含來自數千個人的數據,其中測量了最近9年每年測量的參數X.使用協變量的時間序列分析

Basicly它們處於數據幀DF

id,year,x,feature 
A,2016,376,female 
A,2015,391,female 
A,2014,376,female 
A,2013,373,female 
A,2012,347,female 
A,2011,330,female 
B,2016,398,male 
B,2015,391,male 
B,2014,410,male 
B,2013,393,male 
B,2012,408,male 
B,2011,288,male 
C,2016,2464,male 
C,2015,2465,male 
C,2014,2500,male 
C,2013,2215,male 
C,2012,2228,male 
C,2011,1839,male 

我想在這些時間序列估計不同的模型

像預測(X(t))= F(X( t-1),x(t-2),...,x(tn),feature,id(作爲隨機因子))

我可以看到如何使用ts進行自迴歸建模,個人模型的影子和我想要基於時間歷史和特徵進行全局預測(有其固有的問題)。

因爲數據是高度自相關的,所以lm並不是一個好主意。任何好主意?

+0

您可以嘗試「具有外生輸入模型的自迴歸移動平均模型」(ARMAX)。請參閱或示例'dse'包:https://cran.r-project.org/web/packages/dse/dse.pdf –

+0

儘量查看文檔,但我必須承認這對於像我這樣的MD來說是深奧的。不知道如何把我的數據框放入dse –

回答

1

有很多可能的模型,但這裏是一個AR1結構的混合效果模型,您可以嘗試。

library(nlme) 

fm <- lme(x ~ year + feature, random = ~ year | id, DF, 
    correlation = corAR1(form = ~ year | id)) 
summary(fm) 

,這裏是數據的一個情節:

library(ggplot2) 

ggplot(DF, aes(year, x, group = id, col = feature)) + geom_line() + geom_point() 

screenshot

注:我們假設此輸入數據:

Lines <- " 
id,year,x,feature 
A,2016,376,female 
A,2015,391,female 
A,2014,376,female 
A,2013,373,female 
A,2012,347,female 
A,2011,330,female 
B,2016,398,male 
B,2015,391,male 
B,2014,410,male 
B,2013,393,male 
B,2012,408,male 
B,2011,288,male 
C,2016,2464,male 
C,2015,2465,male 
C,2014,2500,male 
C,2013,2215,male 
C,2012,2228,male 
C,2011,1839,male" 
library(zoo) 
DF <- read.csv(text = Lines, strip.white = TRUE) 
0

有關聲明功能f()出現很多菜單CES。

然而,線性類中,可以使用載體廣義線性模型 (經由vglm()),以適應廣義線性模型與ARMA(或GARCH)的結構,結合 協變量。例如,假設(預設的)隨機錯誤是正態分佈的,則可以使用來自程序包VGAMextra的族函數ARff(),如下所示。

然而,第二個選項通過智能預測使用非參數版本,即VGAMs。 唯一的缺點是vglms/vgams不處理隨機效應。

library(VGAM) 
library(VGAMextra) 
# Fitting a linear model to the mean of the normal distribution 
# allowing an AR(3) struture. Use the modelling function vglm() and 
# the family functions ARff() 
df.read <- DF # DF as given by G.G. 
fit.Lines <- vglm(x ~ feature , ARff(order = 3, 
             zero = c("Var", "ARcoeff")), 
       data = df.read, trace = TRUE) 
coef(fit.Lines, matrix = TRUE) 
summary(fit.Lines, HD = FALSE) 

with(df.read, plot(fitted.values(fit.Lines) ~ year, 
       ylim = c(0, 3000), 
pch = 19, col = as.factor(feature))) 


# Using VGAMs, here, the family function uninormal() is utilized. 
# 

df.read2 <- data.frame(embed(df.read$x, 4)) 
names(df.read2) <- c("x", "xLag1", "xLag2", "xLag3") 
df.read2 <- transform(df.read2, year = df.read$year[-c(1:3)], 
         feature = df.read$feature[-c(1:3)]) 
fit.Lines.vgams <- vgam(x ~ sm.bs(xLag1) + sm.bs(xLag2) + 
         sm.bs(xLag3) + feature + year, 
        uninormal, data = df.read2, trace = TRUE) 

with(df.read2, plot(fitted.values(fit.Lines.vgams) ~ year, 
       ylim = c(0, 3000), 
       pch = 19, col = as.factor(feature)))