2013-06-18 109 views
1

我想使用data.table作爲aggregate()ddply()的替代方法,因爲這兩種方法並不像所希望的那樣有效地擴展爲大對象。不幸的是,我還沒有想出如何獲得向量返回的聚合函數以在data.table的結果中生成多個列。例如:從data.table彙總中返回多列

# required packages 
library(plyr) 
library(data.table) 

# simulated data 
x <- data.table(value=rnorm(100), g=rep(letters[1:5], each=20)) 

# ddply output that I would like to get from data.table 
ddply(data.frame(x), 'g', function(i) quantile(i$value)) 

g  0%  25%   50%  75%  100% 
1 a -1.547495 -0.7842795 0.202456288 0.6098762 2.223530 
2 b -1.366937 -0.4418388 -0.085876995 0.7826863 2.236469 
3 c -2.064510 -0.6411390 -0.257526983 0.3213343 1.039053 
4 d -1.773933 -0.5493362 -0.007549273 0.4835467 2.116601 
5 e -0.780976 -0.2315245 0.194869630 0.6698881 2.207800 

# not quite what I am looking for: 
x[, quantile(value), by=g] 

g   V1 
1: a -1.547495345 
2: a -0.784279536 
3: a 0.202456288 
4: a 0.609876241 
5: a 2.223529739 
6: b -1.366937074 
7: b -0.441838791 
8: b -0.085876995 
9: b 0.782686277 
10: b 2.236468703 

本質上,從ddplyaggregate和所述輸出是在寬的格式,而從data.table輸出爲長格式。答案是否重塑了我的data.table對象的數據或其他參數?

+2

似乎是同樣的問題在這裏http://stackoverflow.com/questions/16150153/create-columns-from-column-of-list-in-回答數據表?RQ = 1 – Dylan

回答

5

嘗試脅迫到一個列表:

> x[, as.list(quantile(value)), by=g] 
    g   0%   25%   50%  75%  100% 
1: a -1.7507334 -0.632331909 0.07435249 0.7459778 1.428552 
2: b -2.2043481 -0.005652353 0.10534325 0.5769475 1.241754 
3: c -1.9313985 -1.120737610 -0.26116926 0.6953009 1.360017 
4: d -0.7434664 -0.055232431 0.22062823 1.1864389 3.021124 
5: e -2.0101657 -0.468674094 0.20209610 0.6286448 2.433152