2014-10-30 60 views
3

我需要從R中的以下數據框中爲每個組[年月]值選擇前兩個值。我已經按count和yearmonth對數據進行了排序。如何我在以下數據中實現了這一點?使用R在列中選擇組內的前N個值

yearmonth name count 
1 201310 Dovas 5 
2 201310 Indulgd 2 
3 201310 Justina 1 
4 201310 Jolita 1 
5 201311 Shahrukh Sheikh 1 
6 201311 Dovas 29 
7 201311 Justina 13 
8 201311 Lina 8 
9 201312 sUPERED 7 
10 201312 John Hansen 7 
11 201312 Lina D. 6 
12 201312 joanna1st 5 

回答

7

或者使用data.tablemydf從@ jazzurro的帖子)。有些選項

library(data.table) 
    setDT(mydf)[order(yearmonth,-count), .SD[1:2], by=yearmonth] 

或者

setDT(mydf)[mydf[order(yearmonth, -count), .I[1:2], by=yearmonth]$V1,] 

或者

setorder(setkey(setDT(mydf), yearmonth), yearmonth, -count)[ 
              ,.SD[1:2], by=yearmonth] 
    # yearmonth  name count 
    #1: 201310  Dovas  5 
    #2: 201310  Indulgd  2 
    #3: 201311  Dovas 29 
    #4: 201311  Justina 13 
    #5: 201312  sUPERED  7 
    #6: 201312 John Hansen  7 
+1

更多學習data.table的例子。 +1 :) – jazzurro 2014-10-30 07:01:52

+0

@jazzurro我猜'setorder'會快一點。 – akrun 2014-10-30 07:03:53

+0

明白了。非常感謝你。我會寫下他們和你的評論。感謝您的大力支持。 – jazzurro 2014-10-30 07:07:25

4

這裏有一種方法:

library(dplyr) 

mydf %>% 
    group_by(yearmonth) %>% 
    arrange(desc(count)) %>% 
    slice(1:2) 

# yearmonth  name count 
#1 201310  Dovas  5 
#2 201310  Indulgd  2 
#3 201311  Dovas 29 
#4 201311  Justina 13 
#5 201312  sUPERED  7 
#6 201312 John Hansen  7 

DATA

mydf <- data.frame(yearmonth = rep(c("201310", "201311", "201312"), each = 4), 
        name = c("Dovas", "Indulgd", "Justina", "Jolita", "Shahrukh Sheikh", 
         "Dovas", "Justina", "Lina", "sUPERED", "John Hansen", 
         "Lina D.", "joanna1st"), 
        count = c(5,2,1,1,1,29,13,8,7,7,6,5), 
        stringsAsFactors = FALSE) 
1

使用基礎R,你可以這樣做:

# sort the data, skip if already done 
df <- df[order(df$yearmonth, df$count, decreasing = TRUE),] 

然後,拿到頂部的兩個元素:

df[ave(df$count, df$yearmonth, FUN = seq_along) <= 2, ]