2017-01-07 52 views
3

我想創建一個函數,該函數可以計算可變數量的最後觀察值和不同變量的移動均值。以此作爲模擬數據:在dplyr中移動均值作爲函數

df = expand.grid(site = factor(seq(10)), 
       year = 2000:2004, 
       day = 1:50) 
df$temp = rpois(dim(df)[1], 5) 

計算1個變量和固定數量的最後的觀測值。例如。這計算了最近5天的溫度的平均值:

library(dplyr) 
library(zoo) 

df <- df %>% 
      group_by(site, year) %>% 
       arrange(site, year, day) %>% 
         mutate(almost_avg = rollmean(x = temp, 5, align = "right", fill = NA)) %>% 
          mutate(avg = lag(almost_avg, 1)) 

到目前爲止這麼好。現在嘗試功能化失敗。

avg_last_x <- function(dataframe, column, last_x) { 

    dataframe <- dataframe %>% 
    group_by(site, year) %>% 
     arrange(site, year, day) %>% 
     mutate(almost_avg = rollmean(x = column, k = last_x, align = "right", fill = NA)) %>% 
      mutate(avg = lag(almost_avg, 1)) 

    return(dataframe) } 

avg_last_x(dataframe = df, column = "temp", last_x = 10) 

我得到這個錯誤:

Error in mutate_impl(.data, dots) : k <= n is not TRUE 

我明白這是可能涉及到evaluation mechanism in dplyr,但我不把它修好。

在此先感謝您的幫助。

回答

6

這應該解決它。

library(lazyeval) 

avg_last_x <- function(dataframe, column, last_x) { 
    dataframe %>% 
    group_by(site, year) %>% 
    arrange(site, year, day) %>% 
    mutate_(almost_avg = interp(~rollmean(x = c, k = last_x, align = "right", 
              fill = NA), c = as.name(column)), 
      avg = ~lag(almost_avg, 1)) 
}