隨着dplyr
> = 0.2,我們可以使用這一do
功能:
library(ggplot2)
library(psych)
library(dplyr)
diamonds %>%
group_by(cut) %>%
do(describe(.$price)) %>%
select(-vars)
#> Source: local data frame [5 x 13]
#> Groups: cut [5]
#>
#> cut n mean sd median trimmed mad min max range skew kurtosis se
#> (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
#> 1 Fair 1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281
#> 2 Good 4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721
#> 4 Premium 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497
#> 5 Ideal 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233
解決方案基於所述purrr
包:
library(ggplot2)
library(psych)
library(purrr)
diamonds %>%
slice_rows("cut") %>%
by_slice(~ describe(.x$price), .collate = "rows")
#> Source: local data frame [5 x 14]
#>
#> cut vars n mean sd median trimmed mad min max range skew kurtosis se
#> (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
#> 1 Fair 1 1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281
#> 2 Good 1 4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good 1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721
#> 4 Premium 1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497
#> 5 Ideal 1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233
不過這樣簡單地用data.table
:
as.data.table(diamonds)[, describe(price), by = cut]
#> cut vars n mean sd median trimmed mad min max range skew kurtosis se
#> 1: Ideal 1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233
#> 2: Premium 1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497
#> 3: Good 1 4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197
#> 4: Very Good 1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721
#> 5: Fair 1 1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281
我們可以寫一個返回一個列表總結自己的功能:
fun <- function(x) {
list(n = length(x),
min = min(x),
median = as.numeric(median(x)),
mean = mean(x),
sd = sd(x),
max = max(x))
}
as.data.table(diamonds)[, fun(price), by = cut]
#> cut n min median mean sd max
#> 1: Ideal 21551 326 1810.0 3457.542 3808.401 18806
#> 2: Premium 13791 326 3185.0 4584.258 4349.205 18823
#> 3: Good 4906 327 3050.5 3928.864 3681.590 18788
#> 4: Very Good 12082 336 2648.0 3981.760 3935.862 18818
#> 5: Fair 1610 337 3282.0 4358.758 3560.387 18574
你能不能用一些示例數據clairify?目前還不清楚你希望得到什麼輸出。 我知道從plyr包ddply你可以將多個值與一個功能: 'ddply(DF(group_variables),總結,名稱=功能(輸入))' –
@ 86smopuiM,這是目前不可能。請參閱[**'dplyr' github **]上的這個問題(https://github.com/hadley/dplyr/issues/154)。另見[**這個問題及其評論**](http://stackoverflow.com/questions/21737815/grouped-operations-that-result-in-length-not-equal-to-1-or-length- of-group-in-dp) – Henrik
@crmhaske,'diamonds'數據附帶'ggplot'。 – Henrik