2015-10-23 34 views
1

好吧,以便標題相當滿意,但這裏是我解決的問題,我很好奇,如果任何人有更好的解決方案或可以進一步推廣它。查找當前觀察組的行範圍內的data.table列的最大值

我有一個時間序列作爲data.table,我很想知道觀察結果是否「趨向於趨勢」,從而說明前後的數據。即這個觀測值是否大於前後觀測年份?

要做到這一點,我的想法是建立在另一列,從上面或下面的行抓取最大值,然後檢查一行是否等於最大值。

我的數據很幸運地定期訂購,意味着每行與其相鄰行的距離相同。我使用這個事實來手動指定窗口大小,而不是檢查每行是否在感興趣的時間距離內。

####################### 
# Package Loading 
usePackage <- function(p) { 
    if (!is.element(p, installed.packages()[,1])) 
    install.packages(p, dep = TRUE) 
    require(p, character.only = TRUE) 
} 

packages <- c("data.table","lubridate") 
for(package in packages) usePackage(package) 
rm(packages,usePackage) 
####################### 

set.seed(1337) 

# creating a data.table 
mydt <- data.table(Name = c(rep("Roger",12),rep("Johnny",8),"Mark"), 
        Date = c(seq(ymd('2010-06-15'),ymd('2015-12-15'), by = '6 month'), 
          seq(ymd('2012-06-15'),ymd('2015-12-15'), by = '6 month'), 
          ymd('2015-12-15'))) 

mydt[ , Value := c(rnorm(12,15,1),rnorm(8,30,2),rnorm(1,100,30))] 
setkey(mydt, Name, Date) 

# setting the number of rows up or down to check 
windowSize <- 2 

# applying the windowing max function 
mydt[, 
    windowMax := unlist(lapply(1:.N, function(x) max(.SD[Filter(function(y) y>0 & y <= .N, unique(abs(x+(-windowSize:windowSize)))), Value]))), 
    by = Name] 

# checking if a value is the local max (by window) 
mydt[, isMaxValue := windowMax == Value] 
mydt 

正如你所看到的,窗口函數是一團糟,但它的確有用。我的問題是:你知道更簡單,更簡潔或更可讀的方法來做同樣的事情嗎?你知道如何概括這個以考慮不規則的時間序列(即不是固定的窗口)嗎?我無法讓zoo::rollapply做我想做的事情,但我沒有那麼多的經驗(我無法解決1行導致功能崩潰的問題)。

讓我知道你的想法,謝謝你!

回答

1

這並沒有真正解決時間窗口的一部分,但如果你想一內膽採用zoo::rollapply,你可以這樣做:

width <- 2 * windowSize + 1 # One central obs. and two on each side 

mydt[, isMaxValue2 := rollapply(Value, width, max, partial = TRUE) == Value, by=Name] 
identical(mydt$isMaxValue, mydt$isMaxValue2) # TRUE 

它比你提出的解決方案有些更清晰,我覺得。

partial = TRUE參數處理窗口中少於5個觀察值時的「邊界效應」。

1

我認爲像rollapply(@ hfty的答案)更有意義,但這裏的另一種方式:

mydt[, wmax := do.call(pmax, c(
    shift(Value, 2:1, type = "lag"), 
    shift(Value, 0:2, type = "lead"), 
    list(na.rm = TRUE) 
)), by=Name] 

這似乎工作:

 Name    Date  Value windowMax  wmax 
1: Johnny 2012-06-14 20:00:00 30.31510 32.97827 32.97827 
2: Johnny 2012-12-14 19:00:00 32.97827 32.97827 32.97827 
3: Johnny 2013-06-14 20:00:00 29.84842 32.97827 32.97827 
4: Johnny 2013-12-14 19:00:00 32.54356 32.97827 32.97827 
5: Johnny 2014-06-14 20:00:00 31.28335 33.72532 33.72532 
6: Johnny 2014-12-14 19:00:00 31.60152 33.72532 33.72532 
7: Johnny 2015-06-14 20:00:00 33.72532 33.72532 33.72532 
8: Johnny 2015-12-14 19:00:00 28.90929 33.72532 33.72532 
9: Mark 2015-12-14 19:00:00 118.57833 118.57833 118.57833 
10: Roger 2010-06-14 20:00:00 15.19249 15.19249 15.19249 
11: Roger 2010-12-14 19:00:00 13.55330 16.62230 16.62230 
12: Roger 2011-06-14 20:00:00 14.67682 16.62230 16.62230 
13: Roger 2011-12-14 19:00:00 16.62230 17.04212 17.04212 
14: Roger 2012-06-14 20:00:00 14.31098 17.04212 17.04212 
15: Roger 2012-12-14 19:00:00 17.04212 17.08193 17.08193 
16: Roger 2013-06-14 20:00:00 15.94378 17.08193 17.08193 
17: Roger 2013-12-14 19:00:00 17.08193 17.08193 17.08193 
18: Roger 2014-06-14 20:00:00 16.91712 17.08193 17.08193 
19: Roger 2014-12-14 19:00:00 14.58519 17.08193 17.08193 
20: Roger 2015-06-14 20:00:00 16.03285 16.91712 16.91712 
21: Roger 2015-12-14 19:00:00 13.32143 16.03285 16.03285 
     Name    Date  Value windowMax  wmax 

要看看它是如何工作的,我們可以看pmax之前的載體:

mydt[, c(
    shift(Value, 2:1, type = "lag"), 
    shift(Value, 0:2, type = "lead") 
), by=Name] 


#  Name  V1  V2  V3  V4  V5 
# 1: Johnny  NA  NA 30.31510 32.97827 29.84842 
# 2: Johnny  NA 30.31510 32.97827 29.84842 32.54356 
# 3: Johnny 30.31510 32.97827 29.84842 32.54356 31.28335 
# 4: Johnny 32.97827 29.84842 32.54356 31.28335 31.60152 
# 5: Johnny 29.84842 32.54356 31.28335 31.60152 33.72532 
# 6: Johnny 32.54356 31.28335 31.60152 33.72532 28.90929 
# 7: Johnny 31.28335 31.60152 33.72532 28.90929  NA 
# 8: Johnny 31.60152 33.72532 28.90929  NA  NA 
# 9: Mark  NA  NA 118.57833  NA  NA 
# 10: Roger  NA  NA 15.19249 13.55330 14.67682 
# 11: Roger  NA 15.19249 13.55330 14.67682 16.62230 
# 12: Roger 15.19249 13.55330 14.67682 16.62230 14.31098 
# 13: Roger 13.55330 14.67682 16.62230 14.31098 17.04212 
# 14: Roger 14.67682 16.62230 14.31098 17.04212 15.94378 
# 15: Roger 16.62230 14.31098 17.04212 15.94378 17.08193 
# 16: Roger 14.31098 17.04212 15.94378 17.08193 16.91712 
# 17: Roger 17.04212 15.94378 17.08193 16.91712 14.58519 
# 18: Roger 15.94378 17.08193 16.91712 14.58519 16.03285 
# 19: Roger 17.08193 16.91712 14.58519 16.03285 13.32143 
# 20: Roger 16.91712 14.58519 16.03285 13.32143  NA 
# 21: Roger 14.58519 16.03285 13.32143  NA  NA 
#  Name  V1  V2  V3  V4  V5 
+1

Nifty!我總是忘記'shift()',這是data.table v1.9.6(2015年9月19日發佈)中的新增功能。 – cocquemas

相關問題