2015-11-08 56 views
2

我無法過渡到data.table。我試圖按某些分類變量進行分組,並應用各自針對不同變量以創建新列的函數列表。這似乎是應該很容易與mapplyMap,但我不能想出組裝正確的子集傳遞給函數。通過不同變量的不同功能創建多個列

這裏是什麼樣子,

set.seed(2015) 
dat <- data.table(cat1 = factor('Total'), 
        cat2 = factor(rep(letters[1:4], 5)), 
        cat3 = factor(rep(1:4, each=5)), 
        var1 = sample(20), 
        var2 = sample(20), 
        var3 = sample(20)) 

## I have list of factor columns to group by 
groups <- c(paste0("cat", 1:3)) 
setkeyv(dat, groups) 

## List of functions, and corresponding list of column names that 
## they are to be applied to. So, in this example I should get 
## two new columns: V1=sum(var1) and V2=mean(var2, var3) 
thing <- function(...) mean(c(...), na.rm=TRUE) # arbitrary function 
funs <- list("sum", "thing")      # named functions 
targets <- list("var1", c("var2", "var3"))  # variables 
outnames <- funs         # names or result columns 

## Can't get this part 
f <- function(fn, vars) do.call(fn, vars) 
dat[, outnames := Map(f, funs, targets), by=groups] 

結果這個例子應該是這樣的

dat[, `:=`(sum=sum(var1), thing=thing(var2, var3)), by=groups] 

回答

3

我們需要子集在「目標基礎上,列名的數據集列'list。一種方法是循環訪問'targets'的list元素和data.table(.SD[, x, with=FALSE])子集,然後應用該函數。

dat[, unlist(outnames) := Map(f, funs, lapply(targets, function(x) 
          .SD[, x, with=FALSE])), by = groups] 
+1

很好,看起來不錯,我可能會有一個後續問題,因爲我也需要通過索引來同時進行子集。謝謝! – jenesaisquoi

相關問題