2017-10-11 61 views
1

均值上限的數值我在爲r的下列數據幀採取滯後和dplyr

name date   month year  hours 
    SSI  01-01-2016 01  2016  2000 
    SSI  02-01-2016 01  2016  1900 
    SSI  03-01-2016 01  2016  2038 
    SSI  04-01-2016 01  2016  2041 
    SSII 01-01-2016 01  2016  2000 
    SSII 02-01-2016 01  2016  2100 
    SSII 03-01-2016 01  2016  2105 
    SSII 04-01-2016 01  2016  2203 

我想計算lag of hours爲每名group by個月和year.Which我可以用下面的代碼

df1 <- df %>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs = hours- lag(hours)) %>% 
    as.data.frame() 

我想要的是哪裏running_hrs大於24或小於0,我想用這個月的平均值來限制這些值。我正在做下面的事情。

new_df <- df%>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs = hours- lag(hours)) %>% 
    mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0,mean(running_hrs),running_hrs)) %>% 
    as.data.frame() 

    name date   month year hours running_hrs running_hrs_new 
    SSI  01-01-2016 01  2016 2000  NA   
    SSI  02-01-2016 01  2016 1900  -100   (3/4) 
    SSI  03-01-2016 01  2016 2038  138   (3/4) 
    SSI  04-01-2016 01  2016 2041  3    3 
    SSII 01-01-2016 01  2016 2000  NA   
    SSII 02-01-2016 01  2016 2100  100   (10/4) 
    SSII 03-01-2016 01  2016 2105  5    5 
    SSII 04-01-2016 01  2016 2110  5    5 

值應該由小於24且大於或等於零的運行小時數的平均值代替。我認爲我們可以使用條件意思

回答

1

希望這有助於!

library(dplyr) 
library(tidyr) 

new_df <- df%>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs = hours- lag(hours)) %>% 
    mutate(valid_running_hrs= ifelse(running_hrs < 24 & running_hrs > 0,running_hrs,0)) %>% 
    replace_na(list(valid_running_hrs=0)) %>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0, mean(valid_running_hrs), running_hrs)) %>% 
    as.data.frame()