2017-06-23 40 views
2

我有一個數據幀值,並以groupIDdate分類:計數和條件,組

d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3), 
       date = c(1,2,3,4,5,6,7,8,9), 
       value = c(1,1,25,1,1,25,1,25,1)) 

> d1 
groupID date value 
     1 1  1 
     1 2  1 
     1 3 25 
     1 4  1 
     1 5  1 
     3 6 25 
     3 7  1 
     3 8 25 
     3 9  1 

我要創建兩個新的欄目:

  1. 對於25每次出現,每個組的前值計數= 1
  2. 對於每次出現的25,在值= 25之後值= 1之前的值在每個組的下一個值= 25之前爲

所需的輸出:

groupID date value Prev1s After1s 
     1 1  1 
     1 2  1 
     1 3 25  2  2 
     1 4  1 
     1 5  1 
     3 6 25  0  1 
     3 7  1 
     3 8 25  1  1 
     3 9  1 

我能夠通過創建一個計數器,並採取前值做使用Excel一樣。我曾嘗試在R中使用sum,shift()來達到相同效果,但徒勞無益。

+0

看看'rle'功能。 –

+0

順便說一下,第二個'Prev1s'應該是2,而不是0。 –

+0

不是,應該是0.(按groupID分組) –

回答

1

您可以使用data.table -package結合的rle -function與dplyr做到這一點...

library(dplyr) 
#first set up some grouping variables based on runs before and after 25s 
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)), 
        AfterGp=cumsum(value==25)) %>% 
#use these to calculate the values you want for each group 
    group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>% 
    group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>% 
    ungroup() %>% 
#remove values (set to "") other than for value==25 
    mutate(Prev1s=replace(Prev1s,value!=25,""), 
     After1s=replace(After1s,value!=25,"")) %>% 
#and remove the grouping variables 
    select(-c(PrevGp,AfterGp)) 

d1 
# A tibble: 9 x 5 
    groupID date value Prev1s After1s 
    <dbl> <dbl> <dbl> <chr> <chr> 
1  1  1  1    
2  1  2  1    
3  1  3 25  2  2 
4  1  4  1    
5  1  5  1    
6  3  6 25  0  1 
7  3  7  1    
8  3  8 25  1  1 
9  3  9  1    
+1

謝謝!我一直在嘗試這樣做超過一個星期! –

0

一種替代方案:

library(data.table) 
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value); 
             i <- p$values == 25; 
             p$values[i] <- shift(p$lengths, fill = 0)[i]; 
             a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i]; 
             p$values[!i] <- a$values[!i] <- NA; 
             list(inverse.rle(p),inverse.rle(a))}, 
      by = groupID][] 

這給:

groupID date value prev1s after1s 
1:  1 1  1  NA  NA 
2:  1 2  1  NA  NA 
3:  1 3 25  2  2 
4:  1 4  1  NA  NA 
5:  1 5  1  NA  NA 
6:  3 6 25  0  1 
7:  3 7  1  NA  NA 
8:  3 8 25  1  1 
9:  3 9  1  NA  NA