我想通過對每列應用統計函數列表來總結一個非常大的數據表。 我想使用data.table
作爲以前的版本plyr
正在工作,但相當慢,我讀這應該快得多。 我嘗試以下,但我得到將函數列表應用到R的每一列data.table中
Error in { :
task 1 failed - "task 1 failed - "second argument must be a list""
這裏是我試過
library(data.table)
library(e1071)
library(nortest)
statistical_tests = list(mean, sd, kurtosis, skewness,
lillie.test, shapiro.test)
summary = function(column) {
result = mapply(do.call, statistical_tests, column)
print(result)
return(result)
}
analyse_fits = function(fit_df) {
#get mean and standard deviation for the three parameters
print(fit_df)
setkey(fit_df, type)
return(fit_df[, lapply(.SD, summary),
by=type])
}
analyse_fits(fit_df)
例如數據fit_df
功能:
constant phase visibility type
1: 49927.22 -2.609797e-03 0.8690605 fft
2: 49965.89 -6.783609e-05 0.8702492 fft
3: 50026.44 -1.109387e-03 0.8680235 fft
4: 50063.78 2.640915e-04 0.8697564 fft
5: 50074.89 9.999202e-04 0.8684974 fft
6: 49964.89 -2.075373e-03 0.8708830 fft
7: 50063.56 -9.737554e-04 0.8721360 fft
8: 50044.11 -1.920089e-03 0.8722035 fft
9: 50100.67 -7.487811e-04 0.8706438 fft
10: 49962.11 4.163415e-03 0.8713016 fft
11: 49926.63 -1.473941e-03 0.8687753 ls
12: 49964.98 1.794244e-03 0.8710003 ls
13: 50025.89 -1.315459e-03 0.8698475 ls
14: 50063.40 2.891339e-04 0.8699723 ls
15: 50074.70 1.859353e-03 0.8684841 ls
16: 49964.43 -6.426037e-04 0.8706581 ls
17: 50063.47 -1.646874e-03 0.8715316 ls
18: 50043.48 -1.435637e-03 0.8713584 ls
19: 50100.36 -2.261318e-03 0.8699203 ls
20: 49961.76 3.659428e-03 0.8704063 ls
我敢肯定有格式化的好方法輸出使其工作,你能幫助我嗎?
正如Carl說的那樣,您似乎錯誤地調用了其中一個函數。對於'base'函數來說,無論如何這都是有效的:'DT [,c('mean','sd')),lapply(.SD,function(x)c(mean(x), sd(x)))),by = type] .' – Frank