2012-09-03 24 views
2

我有以下形式的數據集:爆炸的日期範圍內的行爲R

df <- data.frame(var1 = c("1976-07-04" , "1980-07-04" , "1984-07-04"), 
        var2 = c('d', 'e', 'f'), 
        freq = 1:3) 

我可以非常迅速地擴大這個data.frame使用索引方式:

df.expanded <- df[rep(seq_len(nrow(df)), df$freq), ] 

但是我想有創建一個序列,而不是在日期複製,並有頻率告訴我這個的長度。即第3行,我可以創建的條目,以填補與爆炸data.frame:

seq(as.Date('1984-7-4'), by = 'days', length = 3) 

任何人都可以提出這樣做​​的一個快速的方法?我的方法是使用各種lapply功能來做到這一點

我使用了Gavin Simpson的答案和我的解決方案的先前想法的組合。

ExtendedSeq <- function(df, freq.col, date.col, period = 'month') { 
    #' An R function to take a data fame that has a frequency col and explode the 
    #' the dataframe to have that number of rows and based on a sequence. 
    #' Args: 
    #' df: A data.frame to be exploded. 
    #' freq.col: A column variable indicating the number of replicates in the 
    #'    new dataset to make. 
    #' date.col: A column variable indicating the name or position of the date 
    #'    variable. 
    #' period: The periodicity to apply to the date. 

    # Replicate expanded data form 
    df.expanded <- df[rep(seq_len(nrow(df)), df[[freq.col]]), ] 

    DateExpand <- function(row, df.ex, freq, col.date, period) { 
    #' An inner functions to explode a data set and build out days sequence 
    #' Args: 
    #' row: Each row of a data set 
    #' df.ex: A data.frame, to expand 
    #' freq: Column indicating the number of replicates to make. 
    #' date: Column indicating the date variable 
    #' Output: 
    #' An exploded data set based on a sequence expansion of a date. 
    times <- df.ex[row, freq] 
    # period <- can edit in the future if row/data driven. 
    date.ex <- seq(df.ex[row, col.date], by = "days", length = times) 
    return(date.ex) 
    } 

dates <- lapply(seq_len(nrow(df)), 
       FUN = DateExpand, 
       df.ex = df, 
       freq = freq.col, 
       col.date = date.col, 
       period = period) 

df.expanded[[date.col]] <- as.Date(unlist(dates), origin = '1970-01-01') 
row.names(df.expanded) <- NULL 
return(df.expanded) 
} 

個人而言,我不喜歡,我需要在此基礎上轉換的情況下,這種變化在格蘭未來隱蔽的日期從列表中背部和供應原點的方式,但我真的很感激的想法,並幫助

+0

所以有人不轉貼你已經做什麼,你可以編輯你的帖子,包括當前的方法?你提到「我的方法是使用各種各樣的樂器功能來做到這一點」。 – A5C1D2H2I1M1N2O1R2T1

回答

3

這裏有一種方法:

extendDF <- function(x) { 
    foo <- function(i, z) { 
     times <- z[i, "freq"] 
     out <- data.frame(seq(z[i, 1], by = "days", length = times), 
          rep(z[i, 2], times), 
          rep(z[i, 3], times)) 
     names(out) <- names(z) 
     out 
    } 
    out <- lapply(seq_len(nrow(x)), FUN = foo, z = x) 
    do.call("rbind", out) 
} 

此遍歷索引1:nrow(df)(即df的行索引)將所述在線功能foo到的df每一行。 foo()基本上只是延伸var2freq a freq次數,並使用您的seq()呼叫延長var1。該函數對列順序,名稱等做了一些假設,但您可以根據需要修改它。

其他唯一的一點是,它是更爲高效的使用transform()var1轉換爲"Date"對象都在同一個,而不是反過來又extendDF()每一行,因此,首先做一個轉換,在這裏:

df <- transform(df, var1 = as.Date(var1)) 

然後調用extendDF()

extendDF(df) 

這給:

R> df <- transform(df, var1 = as.Date(var1)) 
R> extendDF(df) 
     var1 var2 freq 
1 1976-07-04 d 1 
2 1980-07-04 e 2 
3 1980-07-05 e 2 
4 1984-07-04 f 3 
5 1984-07-05 f 3 
6 1984-07-06 f 3 
+0

因爲你只給它1個對象,'foo'如何知道'i'和什麼是'z'? –

+0

這是不正確的。仔細查看'lapply()'調用,因爲通過'i'並且'z'通過調用中'z = x'所示的'...'傳遞。 –

1

短,不一定快:

library(plyr) 
adply(df, 1, summarize, var3 = seq(as.Date(var1), by = "days", length = freq)) 
#   var1 var2 freq  var3 
# 1 1976-07-04 d 1 1976-07-04 
# 2 1980-07-04 e 2 1980-07-04 
# 3 1980-07-04 e 2 1980-07-05 
# 4 1984-07-04 f 3 1984-07-04 
# 5 1984-07-04 f 3 1984-07-05 
# 6 1984-07-04 f 3 1984-07-06 
0

還有一句:

df <- data.frame(var1 = c("1976-07-04" , "1980-07-04" , "1984-07-04"), var2 = c('d', 'e', 'f'), freq = 1:3) 
df$id <- seq_len(nrow(df)) 
expanded <- apply(df[c("id","var1","freq")], MARGIN=1, FUN=function(x) { 
    result <- seq.Date(as.Date(x["var1"]), length.out = as.integer(x["freq"]), by = "day") 
    data.frame(id = rep(as.integer(x["id"]), length(result)), result=result) 
}) 
expanded <- do.call(rbind, expanded) 
expanded <- plyr:::join(x = expanded, y = df, by="id", type = "left", match = "first") 
head(expanded)