2017-07-23 15 views
1

下面是我想要做的一個示例。 eval(substitute(*))效果很好,如圖所示here,但使代碼有點難以閱讀。我想知道是否有更好的東西我不知道。將用戶輸入傳遞給data.table中的'by'和重整公式中的公式 - r

我想能夠選擇表格的行和列變量(在最後)。 所以,如果我有

input.col <- 'Gender' 
input.row <- 'Region' 

我希望能夠將這些參數傳遞給數據表,而不是RegionGender被用作以下。

library(data.table) 
library(reshape) 
set.seed(5) 
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02))) 
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1') 

我想要得到下表

Region Female Gender diverse  Male 
1 Africa 32.95019  3.222125 77.50863 
2 Americas 49.12787  0.000000 84.97214 
3  Asia 41.04879  0.000000 55.43294 
4 Europe 45.39469  4.296767 47.76714 
5 Oceania 65.89198  1.439075 72.27496 

謝謝!

回答

4

這裏有一些可能性。除(3)以外,它們只使用data.table。所有的方法在一個操作中執行聚合和重塑,因此不需要首先使用by。如果你真的想用by由於某種原因,反正那麼這會工作:

cast(data = DT[, sum(Weight), by = c(input.row, input.col)], 
    formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1') 

1)data.table :: dcast

dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight") 

,並提供:

 Region Female Gender diverse  Male 
1: Africa 32.95019  3.222125 77.50863 
2: Americas 49.12787  0.000000 84.97214 
3:  Asia 41.04879  0.000000 55.43294 
4: Europe 45.39469  4.296767 47.76714 
5: Oceania 65.89198  1.439075 72.27496 

2)xtabsxtabs是在R的基礎上:

fo <- sprintf("Weight ~ %s + %s", input.row, input.col) 
xtabs(fo, DT) 

,並提供:

  Gender 
Region  Female Gender diverse  Male 
    Africa 32.950187  3.222125 77.508626 
    Americas 49.127873  0.000000 84.972137 
    Asia  41.048787  0.000000 55.432941 
    Europe 45.394693  4.296767 47.767138 
    Oceania 65.891983  1.439075 72.274955 

3)重塑::投我們將使用重塑包,因爲這個問題確實不過要注意的是它已經被reshape2封裝,reshape2一個將使用所取代dcast;然而,dcast也按照(1)在data.table中實現。

cast(DT, paste(input.row, "~", input.col), sum, value = "Weight") 

,並提供:

Region Female Gender diverse  Male 
1 Africa 32.95019  3.222125 77.50863 
2 Americas 49.12787  0.000000 84.97214 
3  Asia 41.04879  0.000000 55.43294 
4 Europe 45.39469  4.296767 47.76714 
5 Oceania 65.89198  1.439075 72.27496 

4)tapply

tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0) 

,並提供:

  Gender 
Region  Female Gender diverse  Male 
    Africa 32.95019  3.222125 77.50863 
    Americas 49.12787  0.000000 84.97214 
    Asia  41.04879  0.000000 55.43294 
    Europe 45.39469  4.296767 47.76714 
    Oceania 65.89198  1.439075 72.27496 
+0

這是一個偉大的名單。它的長度也與我的無知成正比!謝謝。 – Ameya

3

可以使用get然後重命名其可以在公式中可以進一步使用的變量:

input.col <- 'Gender' 
input.row <- 'Region' 

dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))], 
#          ^^^ ^^^    ^^^ ^^^ 
      formula = row ~ col, fun.aggregate = sum, value = 'V1') 

dt 
#  row Female Gender diverse  Male 
#1 Africa 32.95019  3.222125 77.50863 
#2 Americas 49.12787  0.000000 84.97214 
#3  Asia 41.04879  0.000000 55.43294 
#4 Europe 45.39469  4.296767 47.76714 
#5 Oceania 65.89198  1.439075 72.27496