組由ID和過濾器只有組具有最大平均

我有一個DF如下，組由ID和過濾器只有組具有最大平均

a <- data.frame(group =c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5), count = c(12L, 80L, 102L, 97L, 118L, 115L, 4L, 13L, 136L,114L, 134L, 126L, 128L, 63L, 118L, 1L, 28L, 18L, 18L, 23L)) 

    group count 
1  1 12 
2  1 80 
3  1 102 
4  1 97 
5  2 118 
6  2 115 
7  2  4 
8  2 13 
9  3 136 
10  3 114 
11  3 134 
12  3 126 
13  4 128 
14  4 63 
15  4 118 
16  4  1 
17  5 28 
18  5 18 
19  5 18 
20  5 23

我使用了下面的命令，

a %>% group_by(group) %>% summarise(mean(count)) 

    group mean(count) 
    (dbl)  (dbl) 
1  1  72.75 
2  2  62.50 
3  3  127.50 
4  4  77.50 
5  5  21.75

我想篩選出的條目屬於最高平均值的組。這裏說的第三組包含的最大平均，所以我的輸出應該是，

group count 
1  3 136 
2  3 114 
3  3 134 
4  3 126

任何人都可以給一些想法如何做到這一點？

來源

2016-06-08 haimen

已經有很多的選擇。儘管你現有的方法，你只需要添加'％>％slice（which.max（mc））％>％semi_join（a，。，「group」）' –

如果你想看到一個基礎R解決方案，可以使用which.max和aggregate：

# calculate means by group 
myMeans <- aggregate(count~group, a, FUN=mean) 

# select the group with the max mean 
maxMeanGroup <- a[a$group == myMeans[which.max(myMeans$count),]$group, ]

作爲第二種方法，你可以嘗試data.table：

library(data.table) 
setDT(a) 

a[group == a[, list("count"=mean(count)), by=group 
      ][, which.max(count)], ]

group count 
1:  3 136 
2:  3 114 
3:  3 134 
4:  3 126

來源

2016-06-08 19:08:56 lmo

您的基本R方法可以被重寫爲'子集（對於喜歡「子集」的人，使用（aggregate（count〜group，a，mean），group [which.max（count）]））％的％（％，group％） –

@docendodiscimus在輸出上使用'with' '聚合'是一個很酷的想法，我從來沒有見過。謝謝你的提示。 – lmo

使用dplyr：

a %>% group_by(group) %>% 
    mutate(mc = mean(count)) %>% ungroup() %>% 
    filter(mc == max(mc)) %>% select(-mc) 

Source: local data frame [4 x 2] 

    group count 
    (dbl) (int) 
1  3 136 
2  3 114 
3  3 134 
4  3 126

另一種選擇與data.table：

a[a[, .(mc = mean(count)), .(group)][mc == max(mc), -"mc", with=F], on = "group"] 
    group count 
1:  3 136 
2:  3 114 
3:  3 134 
4:  3 126

來源

2016-06-08 19:02:49 Psidom

你要mutate，而不是summarize這樣你就可以把所有的意見在你data.frame。

new_data <- a %>% group_by(group) %>% 
    ##compute average count within groups 
    mutate(AvgCt = mean(count)) %>% 
    ungroup() %>% 
    ##filter, looking for the maximum of the created variable 
    filter(AvgCt == max(AvgCt))

然後你的最終輸出

> new_data 
Source: local data frame [4 x 3] 

    group count AvgCt 
    (dbl) (int) (dbl) 
1  3 136 127.5 
2  3 114 127.5 
3  3 134 127.5 
4  3 126 127.5

而且，如果你喜歡刪除計算變量，

new_data <- new_data %>% select(-AvgCt) 

> new_data 
Source: local data frame [4 x 2] 

    group count 
    (dbl) (int) 
1  3 136 
2  3 114 
3  3 134 
4  3 126

來源

2016-06-08 19:03:09 BarkleyBG

也許還有些xtabs/tabulate過一些有趣的（如果groups不僅僅是數字，則需要將names添加到which.max呼叫）

a[a$group == which.max(xtabs(count ~ group, a)/tabulate(a$group)),] 
# group count 
# 9  3 136 
# 10  3 114 
# 11  3 134 
# 12  3 126

或合併rowsum

a[a$group == which.max(rowsum.default(a$count, a$group)/tabulate(a$group)), ] 
# group count 
# 9  3 136 
# 10  3 114 
# 11  3 134 
# 12  3 126

來源

2016-06-08 19:34:31

組由ID和過濾器只有組具有最大平均

回答

相關問題