2015-04-02 24 views
1

我試圖從Hyndman教授的博客here中描述的兩個外部迴歸因子提取ARIMA模型的一步預測。我首先使用auto.arima生成模型,然後將此模型應用於整個集合。從auto.arima生成漂移模型(預測包)獲取一步預測時的錯誤

該代碼對於我的示例中的第一家公司應如其應用。

Error in `[<-.default`(`*tmp*`, , "drift", value = c(1.00000000000909, : 
subscript out of bounds 

以下代碼包含導致錯誤的時間系列::

df <-structure(list(fYearQtr = c(2004.5, 2004.75, 2005, 2005.25, 2005.5, 
2005.75, 2006, 2006.25, 2006.5, 2006.75, 2007, 2007.25, 2007.5, 
2007.75, 2008, 2008.25, 2008.5, 2008.75, 2009, 2009.25, 2009.5, 
2009.75, 2010, 2010.25, 2010.5, 2010.75, 2011, 2011.25, 2011.5, 
2011.75, 2012, 2012.25, 2012.5, 2012.75, 2013, 2013.25, 2013.5, 
2013.75, 2014, 2014.25), Sales = c(2014, 2350, 3490, 3243, 3520, 
3678, 5749, 4359, 4370, 4837, 7115, 5264, 5410, 6217, 9608, 7512, 
7464, 7895, 11880, 9084, 9734, 12207, 15683, 13499, 15700, 20343, 
26741, 24667, 28571, 28270, 46333, 39186, 35023, 35966, 54512, 
43603, 35323, 37472, 57594, 45646), last_SVI = c(17, 23, 25, 
20, 20, 28, 31, 22, 21, 30, 32, 22, 29, 34, 39, 26, 24, 34, 38, 
24, 28, 33, 34, 22, 38, 34, 38, 34, 34, 40, 52, 34, 34, 58, 54, 
31, 32, 53, 48, 30), SD_SVI = c(0.898717034272917, 1.66410058867569, 
2.35883500145783, 2.49615088301353, 1.48064435037847, 2.87562702192596, 
3.45854571482559, 2.26738299389972, 1.05003052458683, 3.67772226053586, 
3.19855736712181, 5.65685424949238, 2.66024868704471, 5.10153320342434, 
3.77236918007361, 2.79880927062444, 2.59437260831385, 3.0697030675746, 
4.66162731573098, 2.33973480855395, 3.43063124938119, 3.71069141390533, 
3.78255103173669, 9.43873633436932, 4.36918111203273, 3.44368615860597, 
4.85032380626706, 3.51188458428425, 2.16617351389673, 3.01066480434182, 
13.8264358990424, 5.36966789786234, 3.3166247903554, 14.2644438718921, 
7.43260316064229, 2.96777564982468, 4.21383557538856, 12.3594664228036, 
6.83880331412088, 2.01913919206257)), .Names = c("fYearQtr", 
"Sales", "xReg1", "xReg2"), row.names = c(NA, -40L), class = "data.frame") 

實施例的數據:

head(df) 
    fYearQtr Sales xReg1 xReg2 
1 2004.50 2014 17 0.898717 
2 2004.75 2350 23 1.664101 
3 2005.00 3490 25 2.358835 
4 2005.25 3243 20 2.496151 
5 2005.50 3520 20 1.480644 
6 2005.75 3678 28 2.875627 
第二牢固,但是,在提取一步法預報時導致錯誤

構建時間序列對象,列車/測試集並提取一步預測:

require(forecast) 
TS <- ts(df[,2:4], start = c(2004,3), end = c(2014,2), frequency=4) 
TS.TRAIN <- window(TS, end=2011.4) 
TS.TEST <- window(TS, start=2011.5) 

# Build an arima model 
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')]) 
FCAST <- forecast(MODEL, xreg=TS.TEST[,colnames(TS.TEST) %in% c('xReg1', 'xReg2')]) 
# Resulting model: ARIMA(0,1,0)(1,0,1)[4] with drift 

立即提取1步預報:

refit <- Arima(TS[,'Sales'], model=MODEL, xreg=TS[,colnames(TS) %in% c('xReg1', 'xReg2')]) 
## Error in `[<-.default`(`*tmp*`, , "drift", value = c(1.00000000000909, : 
# subscript out of bounds 

混亂的部分:

######################################### 
# Other example: works just fine? 
df_noissues <- structure(list(fQtrYear = c(2004.5, 2004.75, 2005, 2005.25, 2005.5, 
    2005.75, 2006, 2006.25, 2006.5, 2006.75, 2007, 2007.25, 2007.5, 
    2007.75, 2008, 2008.25, 2008.5, 2008.75, 2009, 2009.25, 2009.5, 
    2009.75, 2010, 2010.25, 2010.5, 2010.75, 2011, 2011.25, 2011.5, 
    2011.75, 2012, 2012.25, 2012.5, 2012.75, 2013, 2013.25, 2013.5, 
    2013.75, 2014, 2014.25), Sales = c(5818, 5979, 6221, 6410, 6401, 
6536, 7111, 7797, 7631, 7840, 7908, 8066, 7387, 7387, 6998, 7245, 
6970, 5688, 4147, 4244, 4615, 5433, 4887, 5187, 5287, 5652, 5958, 
6585, 6419, 5989, 6006, 5963, 5833, 5898, 5833, 5849, 5765, 5585, 
5454, 5836), mean_SVI = c(61.1666666666667, 47.9166666666667, 
48.5833333333333, 51.4166666666667, 56, 51.8461538461538, 50.1666666666667, 
60.75, 53.1538461538462, 48.9230769230769, 53, 53.6923076923077, 
55.8461538461538, 46.3333333333333, 51.25, 54.1666666666667, 
52.4166666666667, 50.4166666666667, 54.4166666666667, 49.3333333333333, 
49.1666666666667, 39.5833333333333, 41.8333333333333, 43.9166666666667, 
39.8333333333333, 37.1666666666667, 45.25, 45.9166666666667, 
45.8333333333333, 39.7692307692308, 52.8461538461538, 60.6153846153846, 
44.0769230769231, 37.75, 47.5, 45.1666666666667, 42.1666666666667, 
39.25, 47.25, 47.4166666666667), SD_SVI = c(9.29157324317757, 
11.0737883255365, 8.37157890324957, 6.08213977498251, 7.80442764775809, 
9.09987320283598, 6.16195561244131, 11.2583302491977, 10.4390784542678, 
8.38114489455884, 9.69535971483266, 11.4118696641159, 6.84161474417351, 
8.96795642408249, 3.22278817739603, 6.23528570947538, 4.73782330790941, 
9.3269729410149, 16.1777531496094, 10.9903538972992, 9.64679252708412, 
11.1147595020261, 11.1586357371836, 7.22946412365063, 7.99810583636507, 
6.89971453076579, 7.97866473221497, 3.89541299790439, 6.24984848301189, 
7.5294294400245, 17.0775005677361, 12.6855459844296, 6.00640683578153, 
6.77059148752228, 6.98700091728789, 6.97832140969228, 3.90415474109624, 
4.39265916563698, 3.64629326103298, 5.08935311719625)), .Names = c("fQtrYear", 
"Sales", "xReg1", "xReg2"), row.names = c(NA, -40L), class = "data.frame") 

實施例的數據:

使用以下的數據幀(不同公司)時完全相同的代碼工作
head(df_noissues) 
    fQtrYear Sales xReg1  xReg2 
1 2004.50 5818 61.16667 9.291573 
2 2004.75 5979 47.91667 11.073788 
3 2005.00 6221 48.58333 8.371579 
4 2005.25 6410 51.41667 6.082140 
5 2005.50 6401 56.00000 7.804428 
6 2005.75 6536 51.84615 9.099873  

運行相同的代碼構建測試/訓練集& ARIMA模型:

TS <- ts(df_noissues[,2:4], start = c(2004,3), end = c(2014,2), frequency=4) 
TS.TRAIN <- window(TS, end=2011.4) 
TS.TEST <- window(TS, start=2011.5) 

# Build an arima model 
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')]) 
FCAST <- forecast(MODEL, xreg=TS.TEST[,colnames(TS.TEST) %in% c('xReg1', 'xReg2')]) 

提取1步預測:沒有錯誤。

refit <- Arima(TS[,'Sales'], model=MODEL, xreg=TS[,colnames(TS) %in% c('xReg1', 'xReg2')]) 
## ARIMA(2,0,0)(0,1,0)[4] 

除了在模型中生成(有/無漂移)的差異,我真的不能似乎掌握什麼可能導致此。使用allowdrift = FALSE運行auto.arima似乎可以解決問題。

回答

2

問題出現在xreg參數的列名稱中。在第一種情況下,有一個漂移項,它向xreg添加一列,並觸發列名稱的更改。你可以看到這個模型。

> MODEL 
Series: TS.TRAIN[, "Sales"] 
ARIMA(0,1,0)(1,0,1)[4] with drift   

Coefficients: 
     sar1 sma1  drift xreg.xReg1 xreg.xReg2 
     0.8135 0.6554 1475.542  38.1461  84.5589 
s.e. 0.1314 0.5816 1071.337  45.7297  74.2390 

相比

> MODEL 
Series: TS.TRAIN[, "Sales"] 
ARIMA(2,0,0)(0,1,0)[4]      

Coefficients: 
     ar1  ar2  xReg1 xReg2 
     1.3394 -0.5685 -14.4072 4.3869 
s.e. 0.1766 0.1783 16.3416 24.3968 

我將它添加到列表中的錯誤,看看我能想出一個解決方案。

作爲變通,你可以用改裝漂移模型如下:

TS <- ts(df[,2:4], start = c(2004,3), end = c(2014,2), frequency=4) 
TS.TRAIN <- window(TS, end=2011.4) 
z <- TS.TRAIN[,colnames(TS.TRAIN) %in% c('xReg1', 'xReg2')] 
MODEL <- auto.arima(TS.TRAIN[,'Sales'], xreg=z) 
MODEL2 <- Arima(TS.TRAIN[,'Sales'], order=MODEL$arma[c(1,6,2)], 
       seasonal=MODEL$arma[c(3,7,4)], xreg=cbind(1:nrow(z),z)) 
z <- TS[,colnames(TS) %in% c('xReg1', 'xReg2')] 
refit <- Arima(TS[,'Sales'], model=MODEL2, xreg=cbind(1:nrow(z),z)) 
+0

現在固定https://github.com/robjhyndman/forecast/commit/2ff543368828f7474fa8eba2da06c7e51afceae2 – 2015-10-27 03:28:48