2013-12-23 102 views
3

假設計算事件我有x = rnorm(100000)和,而不是做一個1000長度滑動窗口均線,我想這樣做統計完所有倍x高於0.2一個1000長度滑動窗口。快速的方式在滑動窗口

例如,

x <- rnorm(1004) 
start <- 1:1000 
record <- list() 
while(start[length(start)] <= length(x)) { 
    record[[length(record) + 1]] <- length(which(x[start] > 0.2))/length(start) 
    start <- start + 1 
    print(record[[length(record)]]);flush.console() 
} 

這變得unmanegable對於大length(x)。什麼是高效方法?

回答

4

我的貢獻是計算

cumdiff = function(x) diff(c(0, cumsum(x > .2)), 20) 

它隨着

filt = function(x) filter(x > 0.2, rep(1, 20), sides=1) 
library(TTR); ttr = function(x) runSum(x > .2, 20) 
cumsub = function(x) { z <- cumsum(c(0, x>0.2)); tail(z,-20) - head(z,-20) } 

執行確定

> library(microbenchmark) 
> set.seed(123); xx = rnorm(100000) 
> microbenchmark(cumdiff(xx), filt(xx), ttr(xx), cumsub(xx)) 
Unit: milliseconds 
     expr  min  lq median  uq  max neval 
cumdiff(xx) 11.192005 12.387862 12.469253 12.77588 13.72404 100 
    filt(xx) 20.979503 22.058045 22.442765 23.02612 62.91730 100 
    ttr(xx) 8.390923 10.023934 10.119772 10.46309 11.04173 100 
    cumsub(xx) 7.015654 8.483432 8.538171 8.73596 9.65421 100 

這些不同的條件的累積總和之間的滯後差在結果如何表示的細節(filtttr具有領先的NA,例如),只有filter涉及嵌入式NA的

> xx[22] = NA 
> head(cumdiff(xx)) # NA's propagate, silently 
[1] 9 9 NA NA NA NA 
> ttr(xx) 
Error in runSum(x > 0.2, 20) : Series contains non-leading NAs 
> tail(filt(xx), -19) 
[1] 9 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 8 9 
... 
+2

這甚至比'ttr'快約25%:'˚F< - 功能( x){z < - cumsum(c(0,x> 0.2));尾(z,-20) - 頭(z,-20)}' –

1

一種選擇是rollaply使用:

library(zoo) 
rollapply(zoo(x), width=1000, FUN=function(x) sum(x>0.2)) 

UPD

同樣的解決方案是TTR更快:

library(TTR) 
runSum(rnorm(n)>0.2, 1000) 

一些基準測試(F1 - 動物園的解決方案,F2 - 原始解決方案,f3 - TTR解決方案):

benchmark(f1(5e4), f2(5e4), f3(5e4), replications=10) 
     test replications elapsed relative user.self sys.self user.child sys.child 
1 f1(50000)   10 29.39 326.556  29.36  0   NA  NA 
2 f2(50000)   10 55.57 617.444  55.50  0   NA  NA 
3 f3(50000)   10 0.09 1.000  0.09  0   NA  NA 
3

例如起見我使用長度爲100的向量和超過20個元素

x = rnorm(100) 
y <- x > 0.2 
z <- filter(y, rep(1,20), sides=1) 

filter命令是這裏的特技的窗口計數。 filter以矢量y並遍歷每個元素。 rep(1,20)告訴它一次取20個元素,將它們乘以1,並將它們相加(例如,您可以重新(1/20,20)來得到平均值)。 sides = 1指示過濾器查看元素的右側,即考慮當前元素之前的20個元素。

> x 
    [1] -0.561107013 -0.687590079 -0.512261471 0.335243662 -2.483070553 -0.995055193 1.711199893 0.165077559 -0.130174687 
[10] 0.671520592 -0.034204031 0.793000050 -1.407499532 0.929966998 0.464658383 1.291478397 1.122711651 -0.732289539 
[19] -0.947808154 1.601688862 0.755147769 0.962415501 -0.254079608 -0.283894519 1.091660426 0.926601058 0.249405592 
[28] -1.195970776 -0.384030894 -0.826048400 1.065626378 -1.935038267 1.464437755 0.675437142 0.201552948 0.367500849 
[37] -0.232627068 -1.472755498 -1.894668101 -1.173382831 -0.339885920 -0.248978811 -0.378980900 1.545109833 0.280077032 
[46] 0.684595064 1.867086057 0.320364730 -1.653199399 -1.365201143 -0.012793364 -0.874006845 -0.271929369 -0.119850417 
[55] -0.361720731 0.361642939 0.676907910 0.816370133 0.685732967 -0.868583473 -1.830861530 1.423500571 -0.760462112 
[64] 0.517177871 0.229760540 -0.187106487 -0.324125717 -0.515600949 -0.772414150 -1.144808696 0.003509157 0.138944467 
[73] -0.386850873 0.268739173 0.348214296 0.431076350 -0.418408389 -1.398493830 -0.331882279 -0.203976286 -0.646023792 
[82] -0.729974273 -0.744186350 0.877314843 -0.994216252 0.022162018 -1.101533562 -1.328293297 0.974986850 -0.152985355 
[91] -0.714722527 0.428360138 -0.055607165 -1.015410904 0.320052726 1.951380333 2.288315801 1.295683645 -2.494534350 
[100] 1.171319898 
> y 
    [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE 
[21] TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 
[41] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE 
[61] FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 
[81] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE 
> z 
Time Series: 
Start = 1 
End = 100 
Frequency = 1 
    [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 9 10 11 11 10 11 12 12 12 12 11 12 11 12 12 12 12 11 11 11 10 
[41] 9 8 8 9 9 9 9 10 10 10 9 9 8 7 6 6 7 8 9 9 9 10 10 10 10 9 8 7 7 7 7 7 7 8 9 9 8 7 6 6 
[81] 6 5 5 5 4 4 4 4 5 5 5 6 6 5 5 5 6 7 7 8