總結並列出dplyr中的自定義索引

我想輸出分組的摘要變量和相應的標識變量列表。總結並列出dplyr中的自定義索引

以dplyr::starwars數據集爲例，我想根據性別來計算具有「輕」膚色的字符數，其中每個匹配的名稱向量位於單獨的輸出列中。

在現實世界的使用案例中，會有多個條件到summarise，唯一標識符可能是subjectID/studyID/etc。我對data.table解決方案持開放態度，喜歡基於矢量的解決方案，R Shiny友好，可輕鬆轉換爲功能。

實施例從dplyr::starwars：

starwars %>% 
    filter(species %in% c("Human", "Droid")) %>% 
    group_by(gender) %>% 
    summarise(
    skin = sum(skin_color=="light", na.rm=T), 
    hair = sum(hair_color=="brown", na.rm=T) 
)

希望的輸出：

gender skin hair skinname             hairname 
female 6 6 femname1, femname2, femname3, femname4, femname5, femname6 femhname1, femhname2, femhname3, femhname4, femhname5, femhname6 
male 5 8 mname1, mname2, mname3, mname4, mname5      mhname1, mhname2, mhname3, mhname4, mhname5, mhname6, mhname7 mhname8 
none 0 0             
<NA> 0 0

此輸出將然後使用t()可以tranposed和將使用paste()在DT創建匹配名稱的懸停顯示器（數據表）。

我想我需要在summarise步驟summarise/mutate像

skinname = as.list(.$name[which(skin_color == "light")])

，或者可能是一個自定義函數與do.call。

來源

2017-07-19 Raoul Duke

如果你想有一個嵌套的data.frame，您可以使用tidyr::nest：

library(tidyverse) 

starwars %>% 
    filter(species %in% c("Human", "Droid"), 
      skin_color == 'light') %>% 
    group_by(gender) %>% 
    group_by(skin = n(), add = TRUE) %>% 
    nest(name) 
#> # A tibble: 2 x 3 
#> gender skin    data 
#> <chr> <int>   <list> 
#> 1 female  6 <tibble [6 x 1]> 
#> 2 male  5 <tibble [5 x 1]>

，或者如果你只是想嵌套向量，總結與list：

starwars %>% 
    filter(species %in% c("Human", "Droid"), 
      skin_color == 'light') %>% 
    group_by(gender) %>% 
    summarise(skin = n(), 
       name = list(name)) 
#> # A tibble: 2 x 3 
#> gender skin  name 
#> <chr> <int> <list> 
#> 1 female  6 <chr [6]> 
#> 2 male  5 <chr [5]>

，或者如果你想保留空行，子集而不是過濾器：

starwars %>% 
    filter(species %in% c("Human", "Droid")) %>% 
    group_by(gender) %>% 
    summarise(
     skin = sum(skin_color == "light"), 
     name = list(name[skin_color == 'light']) 
    ) 
#> # A tibble: 4 x 3 
#> gender skin  name 
#> <chr> <int> <list> 
#> 1 female  6 <chr [6]> 
#> 2 male  5 <chr [5]> 
#> 3 none  0 <chr [0]> 
#> 4 <NA>  0 <chr [0]>

如果要將名稱摺疊爲單個字符串toString將執行此作業，但如果您打算稍後進行分隔，請確保字符串中沒有逗號。

來源

2017-07-19 16:09:53 alistaire

感謝你們，我應該提到，目標是創建多個條件計數的分組彙總表。因此，我認爲將'skin_color == light'移動到'filter（）'步驟將不起作用。我將編輯這個問題來澄清。 –

然後它聽起來像你應該分組而不是過濾。 – alistaire

我相信你最後的選擇是我正在尋找的解決方案，謝謝。 –

總結並列出dplyr中的自定義索引

回答

相關問題