2015-09-10 28 views
7

我想對某個字段進行累計求和,但只要遇到0就重置累計值。遇到0時重置的累計和

這裏是我想要的一個例子:

data.frame(campaign = letters[1:4] , 
     date=c("jan","feb","march","april"), 
     b = c(1,0,1,1) , 
     whatiwant = c(1,0,1,2) 
     ) 

campaign date b whatiwant 
1  a jan 1   1 
2  b feb 0   0 
3  c march 1   1 
4  d april 1   2 
+1

到的答案[這個問題我問了幾個星期前(http://stackoverflow.com/questions/32247414/ create-sequential-counter-that-restart-on-a-a-condition-within-panel-data-groups)應該可以幫助你解決這個問題。 – ulfelder

回答

12

另一個基地將只是

with(df, ave(b, cumsum(b == 0), FUN = cumsum)) 
## [1] 1 0 1 2 

根據0出場這只是分列b到組,並計算每這些團體


使用最新版本 data.table

另一種解決方案的b累計總和(V 1.9.6+)

library(data.table) ## v 1.9.6+ 
setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)] 
# campaign date b whatiwant 
# 1:  a jan 1   1 
# 2:  b feb 0   0 
# 3:  c march 1   1 
# 4:  d april 1   2 

每個評論有些基準

set.seed(123) 
x <- sample(0:1e3, 1e7, replace = TRUE) 
system.time(res1 <- ave(x, cumsum(x == 0), FUN = cumsum)) 
# user system elapsed 
# 1.54 0.24 1.81 
system.time(res2 <- Reduce(function(x, y) if (y == 0) 0 else x+y, x, accumulate=TRUE)) 
# user system elapsed 
# 33.94 0.39 34.85 
library(data.table) 
system.time(res3 <- data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)]) 
# user system elapsed 
# 0.20 0.00 0.21 

identical(res1, as.integer(res2)) 
## [1] TRUE 
identical(res1, res3$whatiwant) 
## [1] TRUE 
+1

這很煩人,需要計算'cumsum'兩次。 : -/ –

+0

@KonradRudolph見上面的基準。 –

+0

你可以試試'with(rle(df1 $ b!= 0),sequence(length)* rep(values,lengths))' – akrun

4

您可以使用Reduce函數返回0時所遇到的新的值是0,否則添加新值累計值的自定義函數:

Reduce(function(x, y) if (y == 0) 0 else x+y, c(1, 0, 1, 1), accumulate=TRUE) 
# [1] 1 0 1 2 
5

另一個後期的想法:

ff = function(x) 
{ 
    cs = cumsum(x) 
    cs - cummax((x == 0) * cs) 
} 
ff(c(0, 1, 3, 0, 0, 5, 2)) 
#[1] 0 1 4 0 0 5 7 

,並比較:

library(data.table) 
ffdt = function(x) 
    data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)]$whatiwant 

x = as.numeric(x) ##because 'cumsum' causes integer overflow 
identical(ff(x), ffdt(x)) 
#[1] TRUE 
microbenchmark::microbenchmark(ff(x), ffdt(x), times = 25) 
#Unit: milliseconds 
# expr  min  lq median  uq  max neval 
# ff(x) 315.8010 362.1089 372.1273 386.3892 405.5218 25 
# ffdt(x) 374.6315 407.2754 417.6675 447.8305 534.8153 25