2017-08-11 34 views
0

我試圖根據行A,x在1年內發生的行來創建一個啞元變量x。 我認爲這可能是一個常見問題,並且還有類似的問題已經發布(我發現了this is the most similar)。不幸的是,動物園包不適合,因爲它不能很好地處理irregular spaced dates(我不想聚合行,我的數據太大,無法處理這個問題),我一直試圖unsuccessfully找出一個數據表方法來做到這一點,雖然我希望根據我的經驗總結。爲x在未來發生y創建一個指示變量

dates <- rep(as.Date(c('2015-01-01', '2015-02-02', '2015-03-03', '2016-02-02'), '%Y-%m-%d'), 3) 

names <- c(rep('John', 4), rep('Phil', 4), rep('Ty', 4)) 

df <- data.frame(Name = names, Date = dates, 
      did_y = c(0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0), 
      did_x = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1)) 

Name  Date  did_y did_x 
John  2015-01-01 0  1 
John  2015-02-02 1  0 
John  2015-03-03 1  0 
John  2016-02-02 0  0 
Phil  2015-01-01 1  0 
Phil  2015-02-02 1  1 
Phil  2015-03-03 0  1 
Phil  2016-02-02 0  0  
Ty  2015-01-01 0  0 
Ty  2015-02-02 0  0 
Ty  2015-03-03 0  0 
Ty  2016-02-02 0  1 

我想是

dffinal <- data.frame(Name = names, Date = dates, 
        did_y = c(0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0), 
        did_x = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1), 
        did_x_within_year = c(1, 1, 1, NA, 1, 1, 1, 1, 0, 1, 1, 1), 
        did_x_next_year = c(0, 0, 0, NA, 1, 1, 0, NA, 0, 1, 1, NA)) 

Name  Date  did_y did_x did_x_within_year did_x_next_year 
John  2015-01-01 0  1   1     0 
John  2015-02-02 1  0   1     0 
John  2015-03-03 1  0   1     0 
John  2016-02-02 0  0   NA     NA 
Phil  2015-01-01 1  0   1     1 
Phil  2015-02-02 1  1   1     1 
Phil  2015-03-03 0  1   1     0 
Phil  2016-02-02 0  0   1     NA 
Ty  2015-01-01 0  0   0     0 
Ty  2015-02-02 0  0   1     1 
Ty  2015-03-03 0  0   1     1 
Ty  2016-02-02 0  1   1     NA 

所以我想兩列,一爲當x1年A列內發生(無論之前或之後),而另一個,如果它發生在未來1年內。

我對RcppRoll進行了實驗,但它似乎只在日期中向後看,即如果某件事發生在一年之前,它會變成假,但如果將來發生一年,則不會發生。

df$did_x_next_year <- roll_max(df$did_x, 365, fill = NA) 

編輯:基於其他問題的嘗試性解決方案

我試圖實現this solution(1B),遺憾的是沒有在我的數據幀/數據表實際上改變。即使我將該函數作爲應用於我的數據時的示例,它也不會更新。

library(zoo) 
library(data.table) 
df$Year <- lubridate::year(df$Date) 
df$Month <- lubridate::month(df$Date) 
df$did_x_next_year <- df$did_x 

DT <- as.data.table(df) 

k <- 12 # prior 12 months 

# inputs zoo object x, subsets it to specified window and sums 
Max2 <- function(x) { 
    w <- window(x, start = end(x) - k/12, end = end(x) - 1/12) 
    if (length(w) == 0 || all(is.na(w))) NA_real_ else max(w, na.rm = TRUE) 
} 

nms <- names(DT)[7] 

setkey(DT, Name, Year, Month) # sort 

# create zoo object from arguments and run rollapplyr using Sum2 
roll2 <- function(x, year, month) { 
    z <- zoo(x, as.yearmon(year + (month - 1)/12)) 
    coredata(rollapplyr(z, k+1, Max2, coredata = FALSE, partial = TRUE)) 
} 

DT <- DT[, nms := lapply(.SD, roll2, Year, Month), .SDcols = nms, by = "Name"] 
+0

行A表示第1行? –

+0

嗯,我分組的數據基於名稱列,我正在尋找時間窗口前滾每行,以便計算將向前看,並從每行的日期向後。 – vino88

+0

所以你想要一個滾動平均值或內插? –

回答

0

從一個朋友的建議後,我想出了以下內容:

# Filtering to the obs I care about 
dfadd <- df %>% filter(did_x == 1) %>% select(Name, Date) %>% rename(x_date = Date) 

# Converting to character since in dcast it screws up the dates 
dfadd$x_date <- as.character(dfadd$x_date) 

# Merging data 
df <- plyr::join(df, dfadd, by = 'Name') 

# Creating new column used for dcasting 
df <- df %>% group_by(Name, Date) %>% mutate(x_date_index = seq(from = 1, to = n())) 
df$x_date_index <- paste0('x_date_',df$x_date_index) 

#casting the data wide 
df <- reshape2::dcast(df, 
        Name + Date + did_y + did_x ~ x_date_index, 
        value.var = "x_date", 
        fill = NA) 

# Converting to back to date 
df$x_date_1 <- as.Date(df$x_date_1) 
df$x_date_2 <- as.Date(df$x_date_2) 

# Creating dummy variables 
df$did_x_within_year <- 0 
df$did_x_within_year <- ifelse((df$x_date_1 - df$Date) <= 366, 1, 
df$did_x_within_year) 

df$did_x_next_year <- 0 
df$did_x_next_year <- ifelse(((df$x_date_1 > df$Date) & (df$x_date_1 - df$Date<= 365)), 
         1, df$did_x_next_year) 

# Can extend to account for x_date_2, x_date_3, etc 

# Changing the last entry to NA as desired 
df <- df %>% group_by(Name) %>% mutate(did_x_next_year = c(did_x_next_year[-n()], NA)) 
相關問題