2017-05-14 37 views
3

我有以下DF:的R - 獲得最高值爲每個ID

>> animals_df: 

animal_name age 
cat    1 
cat    1 
cat    2 
cat    3 
cat    3 
dog    1 
dog    1 
dog    3 
dog    4 
dog    4 
dog    4 
horse   1 
horse   3 
horse   5 
horse   5 
horse   5 

我想從每個品種的最高年齡僅提取動物。所以,我想下面的輸出:

animal_name age 
    cat   3 
    cat   3 
    dog   4 
    dog   4 
    dog   4 
    horse  5 
    horse  5 
    horse  5 

我已經嘗試使用:

animals_df = do.call(rbind,lapply(split(animals_df, animals_df$animal_name), function(x) tail(x, 1))) 

但這隻會給每個動物的一個實例,它是這樣的:

animals_name age 
    cat   3 
    dog   4 
    horse  5 
+3

'DAT [與(DAT,年齡== AVE(年齡,animal_name,FUN =最大值)),]'在基R. – thelatemail

回答

4

這很容易dplyr/tidyverse

library(tidyverse) 

# How I read your data in, ignore since you already have your data available 
df = read.table(file="clipboard", header=TRUE) 
df %>% 
    group_by(animal_name) %>% 
    filter(age == max(age)) 

# Output: 
Source: local data frame [8 x 2] 
Groups: animal_name [3] 

    animal_name age 
     <fctr> <int> 
1   cat  3 
2   cat  3 
3   dog  4 
4   dog  4 
5   dog  4 
6  horse  5 
7  horse  5 
8  horse  5 
1

另一個data.table選項是:

library(data.table) 
setDT(df) 
df[, .SD[which(age == max(age))], by = animal_name] 

#  animal_name age 
#1:   cat 3 
#2:   cat 3 
#3:   dog 4 
#4:   dog 4 
#5:   dog 4 
#6:  horse 5 
#7:  horse 5 
#8:  horse 5