2017-09-01 61 views
3

我想按組在兩個日期之間插入行。我這樣做的方式非常複雜,以至於我通過最後一次觀察插入缺失的值,然後合併。我想知道有沒有更簡單的方法來實現它。按組插入行之間的行

# sample data 
user<-c("A","A","B","B","B") 
dummy<-c(1,1,1,1,1) 
date<-as.Date(c("2017/1/3","2017/1/6","2016/5/1","2016/5/3","2016/5/5")) 
dt<-data.frame(user,dummy,date) 

    user dummy  date 
1 A  1 2017-01-03 
2 A  1 2017-01-06 
3 B  1 2016-05-01 
4 B  1 2016-05-03 
5 B  1 2016-05-05 

所需的輸出

enter image description here

回答

6

通過使用dplyrtidyr:)(一個在線解決方案)

library(dplyr) 
library(tidyr) 
dt %>% group_by(user) %>% complete(date=full_seq(date,1),fill=list(dummy=0)) 
# A tibble: 9 x 3 
# Groups: user [2] 
    user  date dummy 
    <fctr>  <date> <dbl> 
1  A 2017-01-03  1 
2  A 2017-01-04  0 
3  A 2017-01-05  0 
4  A 2017-01-06  1 
5  B 2016-05-01  1 
6  B 2016-05-02  0 
7  B 2016-05-03  1 
8  B 2016-05-04  0 
9  B 2016-05-05  1 
2

你可以試試這個

library(data.table) 
setDT(dt) 
tmp <- dt[, .(date = seq.Date(min(date), max(date), by = '1 day')), by = 
'user'] 
dt <- merge(tmp, dt, by = c('user', 'date'), all.x = TRUE) 
dt[, dummy := ifelse(is.na(dummy), 0, dummy)] 
+0

僅有一個建議,OP正在'data.frame' :) – Wen

2

我們可以使用tidyverse來實現這個任務。

library(tidyverse) 

dt2 <- dt %>% 
    group_by(user) %>% 
    do(date = seq(from = min(.$date), to = max(.$date), by = 1)) %>% 
    unnest() %>% 
    left_join(dt, by = c("user", "date")) %>% 
    replace_na(list(dummy = 0)) %>% 
    select(colnames(dt)) 

dt2 
# A tibble: 9 x 3 
    user dummy  date 
    <fctr> <dbl>  <date> 
1  A  1 2017-01-03 
2  A  0 2017-01-04 
3  A  0 2017-01-05 
4  A  1 2017-01-06 
5  B  1 2016-05-01 
6  B  0 2016-05-02 
7  B  1 2016-05-03 
8  B  0 2016-05-04 
9  B  1 2016-05-05 
1

假設你的數據被稱爲DF1,並且要增加兩個天日期試試這個:

library(dplyr) 
df2 <- seq.Date(as.Date("2015-01-03"), as.Date("2015-01-06"), by ="day") 
left_join(df2, df1) 

如果你只是想添加一個新的紀錄,我建議使用rbind。

rbind() 
2

,我發現這樣做是與padr庫的最簡單方法。

library(padr) 
dt_padded <- pad(dt, group = "user", by = "date") %>% 
    replace_na(list(dummy=0)) 
+1

'fillna'至0 :) – Wen

+0

@Wen , 謝謝!我錯過了。 – roarkz

2

甲基R(不太優雅)溶液:

# Data 
user<-c("A","A","B","B","B") 
dummy<-c(1,1,1,1,1) 
date<-as.Date(c("2017/1/3","2017/1/6","2016/5/1","2016/5/3","2016/5/5")) 
df1 <-data.frame(user,dummy,date) 

# Solution 
do.call(rbind, lapply(split(df1, df1$user), function(df) { 
    dff <- data.frame(user=df$user[1], dummy=0, date=seq.Date(min(df$date), max(df$date), 'day')) 
    dff[dff$date %in% df$date, "dummy"] <- df$dummy[1] 
    dff 
})) 


# user dummy date  
# A 1  2017-01-03 
# A 0  2017-01-04 
# A 0  2017-01-05 
# A 1  2017-01-06 
# B 1  2016-05-01 
# B 0  2016-05-02 
# B 1  2016-05-03 
# B 0  2016-05-04 
# B 1  2016-05-05