如果日期相同或+ - 7天，並且ID相同，則合併2行

所以我一直試圖讓我的頭在此，但我無法弄清楚如何去做。如果日期相同或+ - 7天，並且ID相同，則合併2行

這是一個例子：

ID Hosp. date Discharge date 
1 2006-02-02 2006-02-04 
1 2006-02-04 2006-02-18 
1 2006-02-22 2006-03-24 
1 2008-08-09 2008-09-14 
2 2004-01-03 2004-01-08 
2 2004-01-13 2004-01-15 
2 2004-06-08 2004-06-28

我要的是通過行ID，結合IF的discarge日期是一樣的HOSP的一種方式。日期（或+ -7天）在下一行。因此，它應該是這樣的：

ID Hosp. date Discharge date 
1 2006-02-02 2006-03-24 
1 2008-08-09 2008-09-14 
2 2004-01-03 2004-01-15 
2 2004-06-08 2004-06-28

來源

2017-09-24 Bobby Zhao Sheng Lo

相關：[收起行與重疊範圍（https://stackoverflow.com/questions/41747742/collapse-rows-with-overlapping-ranges） – Henrik

使用data.table -package：

# load the package 
library(data.table) 

# convert to a 'data.table' 
setDT(d) 
# make sure you have the correct order 
setorder(d, ID, Hosp.date) 

# summarise 
d[, grp := cumsum(Hosp.date > (shift(Discharge.date, fill = Discharge.date[1]) + 7)) 
    , by = ID 
    ][, .(Hosp.date = min(Hosp.date), Discharge.date = max(Discharge.date)) 
    , by = .(ID,grp)]

你：

ID grp Hosp.date Discharge.date 
1: 1 0 2006-02-02  2006-03-24 
2: 1 1 2008-08-09  2008-09-14 
3: 2 0 2004-01-03  2004-01-15 
4: 2 1 2004-06-08  2004-06-28

同樣的邏輯與dplyr：

library(dplyr) 
d %>% 
    arrange(ID, Hosp.date) %>% 
    group_by(ID) %>% 
    mutate(grp = cumsum(Hosp.date > (lag(Discharge.date, default = Discharge.date[1]) + 7))) %>% 
    group_by(grp, add = TRUE) %>% 
    summarise(Hosp.date = min(Hosp.date), Discharge.date = max(Discharge.date))

二手數據：

d <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), 
        Hosp.date = structure(c(13181, 13183, 13201, 14100, 12420, 12430, 12577), class = "Date"), 
        Discharge.date = structure(c(13183, 13197, 13231, 14136, 12425, 12432, 12597), class = "Date")), 
       .Names = c("ID", "Hosp.date", "Discharge.date"), class = "data.frame", row.names = c(NA, -7L))

來源

2017-09-24 11:17:57 Jaap

如果日期相同或+ - 7天，並且ID相同，則合併2行

回答

相關問題