dplyr總結（）與多個返回值從單一功能

我想知道是否有使用功能與summarise（dplyr 0.1.2），即（從psych包實例describe功能）返回多個值的方法。dplyr總結（）與多個返回值從單一功能

如果不是，它只是因爲它尚未實施，或者是有一個原因，它不會是一個好主意？

實施例：

require(psych) 
require(ggplot2) 
require(dplyr) 

dgrp <- group_by(diamonds, cut) 
describe(dgrp$price) 
summarise(dgrp, describe(price))

生產：Error: expecting a single value

來源

2014-03-07 jzadra

你能不能用一些示例數據clairify？目前還不清楚你希望得到什麼輸出。我知道從plyr包ddply你可以將多個值與一個功能： 'ddply（DF（group_variables），總結，名稱=功能（輸入））' –

@ 86smopuiM，這是目前不可能。請參閱[**'dplyr' github **]上的這個問題（https://github.com/hadley/dplyr/issues/154）。另見[**這個問題及其評論**]（http://stackoverflow.com/questions/21737815/grouped-operations-that-result-in-length-not-equal-to-1-or-length- of-group-in-dp） – Henrik

@crmhaske，'diamonds'數據附帶'ggplot'。 – Henrik

隨着dplyr> = 0.2，我們可以使用這一do功能：

library(ggplot2) 
library(psych) 
library(dplyr) 
diamonds %>% 
    group_by(cut) %>% 
    do(describe(.$price)) %>% 
    select(-vars) 
#> Source: local data frame [5 x 13] 
#> Groups: cut [5] 
#> 
#>   cut  n  mean  sd median trimmed  mad min max range  skew kurtosis  se 
#>  (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) 
#> 1  Fair 1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281 
#> 2  Good 4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197 
#> 3 Very Good 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721 
#> 4 Premium 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497 
#> 5  Ideal 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233

解決方案基於所述purrr包：

library(ggplot2) 
library(psych) 
library(purrr) 
diamonds %>% 
    slice_rows("cut") %>% 
    by_slice(~ describe(.x$price), .collate = "rows") 
#> Source: local data frame [5 x 14] 
#> 
#>   cut vars  n  mean  sd median trimmed  mad min max range  skew kurtosis  se 
#>  (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) 
#> 1  Fair  1 1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281 
#> 2  Good  1 4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197 
#> 3 Very Good  1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721 
#> 4 Premium  1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497 
#> 5  Ideal  1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233

不過這樣簡單地用data.table：

as.data.table(diamonds)[, describe(price), by = cut] 
#>   cut vars  n  mean  sd median trimmed  mad min max range  skew kurtosis  se 
#> 1:  Ideal 1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233 
#> 2: Premium 1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497 
#> 3:  Good 1 4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197 
#> 4: Very Good 1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721 
#> 5:  Fair 1 1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281

我們可以寫一個返回一個列表總結自己的功能：

fun <- function(x) { 
    list(n = length(x), 
     min = min(x), 
     median = as.numeric(median(x)), 
     mean = mean(x), 
     sd = sd(x), 
     max = max(x)) 
} 
as.data.table(diamonds)[, fun(price), by = cut] 
#>   cut  n min median  mean  sd max 
#> 1:  Ideal 21551 326 1810.0 3457.542 3808.401 18806 
#> 2: Premium 13791 326 3185.0 4584.258 4349.205 18823 
#> 3:  Good 4906 327 3050.5 3928.864 3681.590 18788 
#> 4: Very Good 12082 336 2648.0 3981.760 3935.862 18818 
#> 5:  Fair 1610 337 3282.0 4358.758 3560.387 18574

來源

2014-09-13 16:09:12

有沒有辦法將'do'和'summarise'結合起來？ – kungfujam

@Artem，如何結合做和總結當我想使用自定義函數總結？ – alily

@alily只需使用你的函數來代替'describe'，如我的答案所示。你爲什麼要使用'彙總'？ –

dplyr總結（）與多個返回值從單一功能

回答

相關問題