[R天

2017-05-24 24 views
0

我有一個數據幀,看起來像這樣:[R天

df_raw <- structure(list(date = structure(c(17075, 17076, 17077, 17108, 
17109, 17110, 17111, 17112, 17113, 17221, 17222, 17223, 17224, 
17225, 17226, 17227, 17228, 17229, 17230, 17231, 17232, 17286, 
17075, 17076, 17077, 17078, 17079, 17080, 17081, 17082, 17083, 
17084, 17085, 17086, 17087, 17088, 17089, 17090, 17091), class = "Date"), 
    Req_BU = c("12018", "12018", "12018", "12018", "12018", "12018", 
    "12018", "12018", "12018", "12018", "12018", "12018", "12018", 
    "12018", "12018", "12018", "12018", "12018", "12018", "12018", 
    "12018", "12018", "14004", "14004", "14004", "14004", "14004", 
    "14004", "14004", "14004", "14004", "14004", "14004", "14004", 
    "14004", "14004", "14004", "14004", "14004"), last_rec_date = c(1L, 
    1L, 1L, 1L, 1L, NA, NA, 3L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 1L, 
    1L, 1L, NA, NA, 3L, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 
    3L, 1L, 1L, 1L, 1L, NA, 2L, 1L)), .Names = c("date", "Req_BU", 
"last_rec_date"), row.names = c(NA, -39L), class = "data.frame") 


> head(df_raw, 10) 
     date Req_BU last_rec_date 
1 2016-10-01 12018    1 
2 2016-10-02 12018    1 
3 2016-10-03 12018    1 
4 2016-11-03 12018    1 
5 2016-11-04 12018    1 
6 2016-11-05 12018   NA 
7 2016-11-06 12018   NA 
8 2016-11-07 12018    3 
9 2016-11-08 12018    1 
10 2017-02-24 12018    1 

> df_raw[22:30, ] 
     date Req_BU last_rec_date 
22 2017-04-30 12018    1 
23 2016-10-01 14004   NA 
24 2016-10-02 14004   NA 
25 2016-10-03 14004    1 
26 2016-10-04 14004    1 
27 2016-10-05 14004    1 
28 2016-10-06 14004    1 
29 2016-10-07 14004    1 
30 2016-10-08 14004   NA 

我需要做的就是,因爲天數替換last_rec_dateNA值最後一個非NA。這一切都需要根據名爲Req_BU的分組變量完成。我的數據從2016年10月1日開始,如果某個特定的Req_BU以該日期的NA開頭,則需要填寫1並繼續執行此操作,直到存在正常邏輯接管的非NA值。

我在找這樣的東西。

> head(df_hope, 10) 
     date Req_BU last_rec_date 
1 2016-10-01 12018    1 
2 2016-10-02 12018    1 
3 2016-10-03 12018    1 
4 2016-11-03 12018    1 
5 2016-11-04 12018    1 
6 2016-11-05 12018    1 
7 2016-11-06 12018    2 
8 2016-11-07 12018    3 
9 2016-11-08 12018    1 
10 2017-02-24 12018    1 

> df_hope[22:30, ] 
     date Req_BU last_rec_date 
22 2017-04-30 12018    1 
23 2016-10-01 14004    1 
24 2016-10-02 14004    1 
25 2016-10-03 14004    1 
26 2016-10-04 14004    1 
27 2016-10-05 14004    1 
28 2016-10-06 14004    1 
29 2016-10-07 14004    1 
30 2016-10-08 14004    1 

我試過了,但它甚至沒有處理我需要的邏輯的第一部分。

library(dplyr) 
df_not_working <- df_raw %>% 
    group_by(Req_BU) %>% 
    mutate(last_rec_date = ifelse(is.na(last_rec_date), 
           c(NA, diff(date)), 
            last_rec_date)) 

> df_not_working 
Source: local data frame [39 x 3] 
Groups: Req_BU [2] 

# A tibble: 39 x 3 
     date Req_BU last_rec_date 
     <date> <chr>   <dbl> 
1 2016-10-01 12018    1 
2 2016-10-02 12018    1 
3 2016-10-03 12018    1 
4 2016-11-03 12018    1 
5 2016-11-04 12018    1 
6 2016-11-05 12018    1 
7 2016-11-06 12018    1 
8 2016-11-07 12018    3 
9 2016-11-08 12018    1 
10 2017-02-24 12018    1 

分析的其餘部分是相當dplyr重,所以我確定使用或鹼性溶液(如果存在)。謝謝。

回答

1

也許這會工作嗎?不是很R-ish,所以也許有人有更好的方法。

fill_na <- function(df, colname){ 
    x<- 1 
    col <- as.character(colname) 
    dfcol <- df[as.character(colname)] 
    for(i in 1:nrow(dfcol)){ 
    ifelse(is.na(dfcol[i,col]), { 
     df[i,col] = x 
     x <- x + 1 
    }, 
    x <- 1) 
    } 
    return(df) 
} 

df_hope <- unsplit(lapply(split(df_raw, f = df_raw$Req_BU), fill_na, colname = "last_rec_date"), f = df_raw$Req_BU) 

編輯:做了一個清晰的例子來測試方法:

example_df <- structure(list(date = structure(c(17075, 17076, 17077, 17108, 
17109, 17083, 17084, 17085, 17086, 17087), class = "Date"), Req_BU = c("12018", 
"12018", "12018", "12018", "12018", "14004", "14004", "14004", 
"14004", "14004"), last_rec_date = c(1L, 1L, 1L, NA, NA, NA, 
NA, NA, 1L, 1L)), .Names = c("date", "Req_BU", "last_rec_date" 
), row.names = c(1L, 2L, 3L, 4L, 5L, 31L, 32L, 33L, 34L, 35L), class = "data.frame") 

> example_df 
     date Req_BU last_rec_date 
1 2016-10-01 12018    1 
2 2016-10-02 12018    1 
3 2016-10-03 12018    1 
4 2016-11-03 12018   NA 
5 2016-11-04 12018   NA 
31 2016-10-09 14004   NA 
32 2016-10-10 14004   NA 
33 2016-10-11 14004   NA 
34 2016-10-12 14004    1 
35 2016-10-13 14004    1 

與在NA值越過12018和14004之間的「Req_BU」的「邊界」一個數據幀開始,拆分數據幀由「Req_BU」值轉換爲獨立數據幀的列表。然後,在使用unsplit返回單個數據幀之前,使用lapply將以上函數應用於每個單獨的數據幀。

df_ex <- unsplit(lapply(split(example_df, f = example_df$Req_BU), fill_na, colname = "last_rec_date"), f = example_df$Req_BU) 

> df_ex 
     date Req_BU last_rec_date 
1 2016-10-01 12018    1 
2 2016-10-02 12018    1 
3 2016-10-03 12018    1 
4 2016-11-03 12018    1 
5 2016-11-04 12018    2 
31 2016-10-09 14004    1 
32 2016-10-10 14004    2 
33 2016-10-11 14004    3 
34 2016-10-12 14004    1 
35 2016-10-13 14004    1