2016-06-23 54 views
5

我無法按團隊分組。R和dplyr組中的滯後/領先

數據:

df <- data.frame(Team = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D"), 
      Date = c("2016-05-10","2016-05-10", "2016-05-10", "2016-05-10", 
         "2016-05-12", "2016-05-12", "2016-05-12", 
         "2016-05-15","2016-05-15", 
         "2016-05-30", "2016-05-30"), 
      Points = c(1,4,3,2,1,5,6,1,2,3,9) 
      ) 

Team  Date  Points 
A  2016-05-10  1 
A  2016-05-10  4 
A  2016-05-10  3 
A  2016-05-10  2 
B  2016-05-12  1 
B  2016-05-12  5 
B  2016-05-12  6 
C  2016-05-15  1 
C  2016-05-15  2 
D  2016-05-30  3 
D  2016-05-30  9 

預期結果:

Team  Date  Points Date_Lagged 
A  2016-05-10  1   NA 
A  2016-05-10  4   NA 
A  2016-05-10  3   NA 
A  2016-05-10  2   NA 
B  2016-05-12  1  2016-05-10 
B  2016-05-12  5  2016-05-10 
B  2016-05-12  6  2016-05-10 
C  2016-05-15  1  2016-05-12 
C  2016-05-15  2  2016-05-12 
D  2016-05-30  3  2016-05-15 
D  2016-05-30  9  2016-05-15 

我抓我的頭,我意識到下面是不是正確的解決辦法:

df %>% group_by(Date) %>% mutate(Date_lagged = lag(Date)) 

任何想法如何解決它?

回答

6

lag默認與n=1抵消。但是,我們有「團隊」和「日期」的重複元素。爲了獲得預期的輸出,我們需要獲得'Team','Date'的distinct行,創建'Date_lagged',其中'Date'的lagright_join(或left_join)與原始數據集一起。

distinct(df, Team, Date) %>% 
     mutate(Date_Lagged = lag(Date)) %>% 
     right_join(., df) %>% 
     select(Team, Date, Points, Date_Lagged) 
# Team  Date Points Date_Lagged 
#1  A 2016-05-10  1  <NA> 
#2  A 2016-05-10  4  <NA> 
#3  A 2016-05-10  3  <NA> 
#4  A 2016-05-10  2  <NA> 
#5  B 2016-05-12  1 2016-05-10 
#6  B 2016-05-12  5 2016-05-10 
#7  B 2016-05-12  6 2016-05-10 
#8  C 2016-05-15  1 2016-05-12 
#9  C 2016-05-15  2 2016-05-12 
#10 D 2016-05-30  3 2016-05-15 
#11 D 2016-05-30  9 2016-05-15 

或者我們也可以做

df %>% 
    mutate(Date_Lagged = rep(lag(unique(Date)), table(Date))) 
3

可以使用rle與基礎R就此別過,例如:

with(rle(as.character(df$Date)), rep(c(NA, head(values, -1)), lengths)) 
# [1] NA   NA   NA   NA   "2016-05-10" "2016-05-10" 
# [7] "2016-05-10" "2016-05-12" "2016-05-12" "2016-05-15" "2016-05-15"