2015-09-21 96 views
3

我有一個時間序列數據如下所示。處理在時間序列數據連續型缺失值

2015-04-26 23:00:00 5704.27388916015661380 
2015-04-27 00:00:00 4470.30868326822928793 
2015-04-27 01:00:00 4552.57241617838553793 
2015-04-27 02:00:00 4570.2225003282565
2015-04-27 03:00:00 NA 
2015-04-27 04:00:00 NA 
2015-04-27 05:00:00 NA 
2015-04-27 06:00:00 12697.37724086216439900 
2015-04-27 07:00:00 5538.71119009653739340 
2015-04-27 08:00:00 81.95060647328695325 
2015-04-27 09:00:00 8550.65816895300667966 
2015-04-27 10:00:00 2925.76573206583680076 

我該如何處理連續的NA值。在我只有一個NA的情況下,我使用NA入門的極端值的平均值。有沒有處理連續缺失值的標準方法?

回答

10

zoo包有一個用於處理NA值幾個功能。以下功能之一可能適合您的需求:

  • na.locf:最後一次觀察結轉。使用參數fromLast = TRUE對應於後面進行的下一個觀察(NOCB)。
  • na.aggregate:在NA的一些彙總值替換。默認的聚合功能是mean,但您也可以指定其他功能。有關更多信息,請參閱?na.aggregate
  • na.approxNA的被替換爲線性內插的值。

您可以比較的結果,看看這些功能做:

library(zoo) 
df$V.loc <- na.locf(df$V2) 
df$V.agg <- na.aggregate(df$V2) 
df$V.app <- na.approx(df$V2) 

這導致:

> df 
        V1   V2  V.loc  V.agg  V.app 
1 2015-04-26 23:00:00 5704.27389 5704.27389 5704.27389 5704.27389 
2 2015-04-27 00:00:00 4470.30868 4470.30868 4470.30868 4470.30868 
3 2015-04-27 01:00:00 4552.57242 4552.57242 4552.57242 4552.57242 
4 2015-04-27 02:00:00 4570.22250 4570.22250 4570.22250 4570.22250 
5 2015-04-27 03:00:00   NA 4570.22250 5454.64894 6602.01119 
6 2015-04-27 04:00:00   NA 4570.22250 5454.64894 8633.79987 
7 2015-04-27 05:00:00   NA 4570.22250 5454.64894 10665.58856 
8 2015-04-27 06:00:00 12697.37724 12697.37724 12697.37724 12697.37724 
9 2015-04-27 07:00:00 5538.71119 5538.71119 5538.71119 5538.71119 
10 2015-04-27 08:00:00 81.95061 81.95061 81.95061 81.95061 
11 2015-04-27 09:00:00 8550.65817 8550.65817 8550.65817 8550.65817 
12 2015-04-27 10:00:00 2925.76573 2925.76573 2925.76573 2925.76573 

使用的數據:

df <- structure(list(V1 = structure(c(1430082000, 1430085600, 1430089200, 1430092800, 1430096400, 1430100000, 1430103600, 1430107200, 1430110800, 1430114400, 1430118000, 1430121600), class = c("POSIXct", "POSIXt"), tzone = ""), V2 = c(5704.27388916016, 4470.30868326823, 4552.57241617839, 4570.22250032826, NA, NA, NA, 12697.3772408622, 5538.71119009654, 81.950606473287, 8550.65816895301, 2925.76573206584)), .Names = c("V1", "V2"), row.names = c(NA, -12L), class = "data.frame")