2017-05-25 56 views
1

我有興趣找到自上次事件以來的天數每個ID。數據是這樣的:R:自上次事件以來的天數每個ID

df <- data.frame(date=as.Date(
c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001", 
"06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), "%d/%m/%Y"), 
event=c(0,0,1,0,1, 1,0,0,0,1),id = c(rep(1,5),rep(2,5))) 

     date event id 
1 2000-07-06  0 1 
2 2000-09-15  0 1 
3 2000-10-15  1 1 
4 2001-01-03  0 1 
5 2001-03-17  1 1 
6 2010-08-06  1 2 
7 2010-09-15  0 2 
8 2010-10-15  0 2 
9 2011-01-03  0 2 
10 2011-03-17  1 2 

我從一個數據表解決方案here大舉借貸,但這並不考慮的ID。

library(data.table) 
setDT(df) 
setkey(df, date,id) 

df = df[event == 1, .(lastevent = date), key = date][df, roll = TRUE] 
df[, tae := difftime(lastevent, shift(lastevent, 1L, "lag"), unit = "days")] 
df[event == 0, tae:= difftime(date, lastevent, unit = "days")] 

它產生如下的輸出

  date lastevent event id  tae 
1: 2000-07-06  <NA>  0 1 NA days 
2: 2000-09-15  <NA>  0 1 NA days 
3: 2000-10-15 2000-10-15  1 1 NA days 
4: 2001-01-03 2000-10-15  0 1 80 days 
5: 2001-03-17 2001-03-17  1 1 153 days 
6: 2010-08-06 2010-08-06  1 2 3429 days 
7: 2010-09-15 2010-08-06  0 2 40 days 
8: 2010-10-15 2010-08-06  0 2 70 days 
9: 2011-01-03 2010-08-06  0 2 150 days 
10: 2011-03-17 2011-03-17  1 2 223 days 

但是我的期望的輸出如下所示:

  date lastevent event id  tae 
1: 2000-07-06  <NA>  0 1 NA days 
2: 2000-09-15  <NA>  0 1 NA days 
3: 2000-10-15 2000-10-15  1 1 NA days 
4: 2001-01-03 2000-10-15  0 1 80 days 
5: 2001-03-17 2001-03-17  1 1 153 days 
6: 2010-08-06 2010-08-06  1 2 NA days 
7: 2010-09-15 2010-08-06  0 2 40 days 
8: 2010-10-15 2010-08-06  0 2 70 days 
9: 2011-01-03 2010-08-06  0 2 150 days 
10: 2011-03-17 2011-03-17  1 2 223 days  

唯一的區別是所述NA在6行和列TAEThis是一個沒有答案的相關文章。我看過here,但解決方案不適用於我的情況。還有很多其他問題,但不是每個ID的計算。謝謝!

回答

2
df <- data.table(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), 
"%d/%m/%Y"), event=c(0,0,1,0,1, 1,0,1,0,1),id = c(rep(1,5),rep(2,5))) 

tempdt <- df[event==1,] 

tempdt[,tae := date - shift(date), by = id] 

df <- merge(df, tempdt, by = c("date", "event", "id"), all.x = TRUE) 

df[, tae := ifelse(shift(event)==1, date - shift(date), tae), by = id] 

編輯

更通用的解決方案

df <- data.table(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001", "18/03/2001", 
          "06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011","19/03/2011"), 
          "%d/%m/%Y"), 
      event=c(1,0,0,0,0,0,1,1,1,0,1,0),id = c(rep(1,6),rep(5,6))) 

##for event = 1 observations 
tempdt <- df[event==1,] 

tempdt[,tae := date - shift(date), by = id] 

df <- merge(df, tempdt, by = c("date", "event", "id"), all.x = TRUE) 

##for event = 0 observations 
for(d in df[event==0, date]){ 
    # print(as.Date(d, origin = "1970-01-01")) 
    df[date == d & event == 0, tae := as.Date(d, origin = "1970-01-01") - 
    max(df[date<d & event==1,date]), by = id] 
} 

EDIT 2 現在,必須有這樣做一個更快的方法,但如果第一觀察event = 0,這韓元不會產生任何警告

df <- data.table(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), 
          "%d/%m/%Y"), event=c(0,0,1,0,1, 1,0,0,0,1),id = c(rep(1,5),rep(2,5))) 

tempdt <- df[event==1,] 

tempdt[,tae := date - shift(date), by = id] 

df <- merge(df, tempdt, by = c("date", "event", "id"), all.x = TRUE) 

for(i in unique(df[,id])){ 
    # print(i) 
    for(d in df[date>df[id == i & event==1,min(date)] & event==0, date]){ 
    # print(as.Date(d, origin = "1970-01-01")) 
    df[id == i & date == d & event == 0, 
    tae := as.Date(d, origin = "1970-01-01") - max(df[date<d & 
    event==1,date])] 
    } 
} 
+1

太簡單了。好痛。非常感謝! –

+0

只是想提及您的代碼不適用於此數據:df < - data.frame(date = as.Date(c(「06/07/2000」,「15/09/2000」,「15/10/2000「,」03/01/2001「,」17/03/2001「, 」18/03/2001「,」06/08/2010「,」15/09/2010「,」15/10/2010年「,」03/01/2011「,」17/03/2011「,」19/03/2011「), 」%d /%m /%Y「),事件= c(1,0,0 ,0,0,0,1,1,1,0,1,0),id = c(rep(1,6),rep(5,6))) –

+0

@HOSS_JFL讓我知道更新是否適合你 – simone