2016-02-12 36 views
0

有關示例數據幀:子集的數據幀根據一列中識別最大值和最小值(在R)

df1 <- structure(list(id = 1:21, region = structure(c(1L, 1L, 1L, 1L, 
                2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
                4L), .Label = c("a", "b", "c", "d"), class = "factor"), weight = c(0.35, 
                                0.65, 0.99, 1.5, 3.2, 2.1, 1.3, 3.2, 1.3, 2, 0.6, 0.6, 0.6, 0.45, 
                                1, 1.2, 1.4, 2, 1.3, 1, 2), condition = c(0L, 1L, 0L, 1L, 0L, 
                                           0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L 
                                )), .Names = c("id", "region", "weight", "condition"), class = "data.frame", row.names = c(NA, 
                                                       -21L)) 

我希望排除不具有任一'1'的最高或最低編號的區域在地區結果變量。例如,我通常會做:

summary <- setDT(df)[,.(.result = weighted.mean((condition==1), 
     w = weight)*100), by = region] 

這將使我: 總結

region .result 
1:  a 61.60458 
2:  b 39.69466 
3:  c 50.56180 
4:  d 61.03896 

所以我會從子集數據幀DF區c和d。

是否可以在一步完成此操作而無需手動查看摘要數據框?

回答

3

我的理解是,您希望排除所有不是最高和最低值的值。它不能作爲一個班輪,但如果你添加以下內容,你應該得到你想要的:

incl <- summary[c(which.min(.result), which.max(.result)),region] 
newdf <- df1[region %in% incl,] 
newdf 

    id region weight condition 
1: 5  b 3.20   0 
2: 6  b 2.10   0 
3: 7  b 1.30   0 
4: 8  b 3.20   1 
5: 9  b 1.30   0 
6: 10  b 2.00   1 
7: 1  a 0.35   0 
8: 2  a 0.65   1 
9: 3  a 0.99   0 
10: 4  a 1.50   1 
相關問題