2015-06-15 71 views
0

我有一個data.frame,我想返回minmax時間觀察值 值。dplyr>獲取具有最小和最大變量的行

df<- data.frame(
    time=c(24594.55, 29495.45, 24594.55, 39297.27, 24594.55, 34396.36, 19693.64, 14792.73, 29495.45), 
    Mz=c(-0.04729751, -0.50902297, -0.04376393, -0.22218980, -0.36407263, -0.38341534, -0.34597255, -0.01480776, -0.00999671), 
    set_nbr=c(1, 1,1, 2, 2, 2, 3, 3, 3))   


library(dplyr) 

min_time <- df %>% 
    group_by(set_nbr) %>% 
    slice(which(Mz<0))%>% 
    filter(rank(time,ties.method="min")==1)%>% 
    distinct 


min_time 

##Source: local data frame [3 x 3] 
##Groups: set_nbr 

     time   Mz set_nbr 
## 1 24594.55 -0.04729751  1 
## 2 24594.55 -0.36407263  2 
## 3 14792.73 -0.01480776  3 

這個工作,但是當我試圖讓MAX_TIME,奇怪的結果來了:

max_time <- df %>% 
    group_by(set_nbr) %>% 
    slice(which(Mz<0))%>% 
    filter(rank(time,ties.method="max")==1)%>% 
    distinct 

max_time 

##Source: local data frame [2 x 3] 
##Groups: set_nbr 

     time   Mz set_nbr 
##1 24594.55 -0.36407263  2 
##2 14792.73 -0.01480776  3 

set_nbr 1,和maxtime值不正確。我不知道爲什麼。

預計輸出

max_time 

     time   Mz set_nbr 
##1 29495.45 -0.50902297  1 
##2 39297.27 -0.22218980  2 
##3 29495.45 -0.00999671  3 
+0

我不確定你在用'filter' +'rank'做什麼,但是你知道'arrange'函數嗎?像'df%>%排列(時間)%>%group_by(set_nbr)%>%slice(c(1,n()))'' – Frank

回答

2

嘗試

df %>% 
    group_by(set_nbr) %>% 
    filter(time==max(time)) 
#  time   Mz set_nbr 
#1 29495.45 -0.50902297  1 
#2 39297.27 -0.22218980  2 
#3 29495.45 -0.00999671  3 

或者

df %>% 
    group_by(set_nbr) %>% 
    slice(which.max(time)) 
#  time   Mz set_nbr 
#1 29495.45 -0.50902297  1 
#2 39297.27 -0.22218980  2 
#3 29495.45 -0.00999671  3 

至於爲什麼你的代碼沒有工作

df %>% 
    group_by(set_nbr) %>% 
    slice(which(Mz <0)) %>% 
    mutate(rn = rank(time, ties.method='max')) 
#  time   Mz set_nbr rn 
#1 24594.55 -0.04729751  1 2 
#2 29495.45 -0.50902297  1 3 
#3 24594.55 -0.04376393  1 2 
#4 39297.27 -0.22218980  2 3 
#5 24594.55 -0.36407263  2 1 
#6 34396.36 -0.38341534  2 2 
#7 19693.64 -0.34597255  3 2 
#8 14792.73 -0.01480776  3 1 
#9 29495.45 -0.00999671  3 3 

如果查看輸出,對於'set_nbr'組'1',由於有關係,'rn'不存在'1'。你可以做

df %>% 
    group_by(set_nbr) %>% 
    slice(which(Mz <0)) %>% 
    filter(rn = rank(-time, ties.method='first')==1) 
#  time   Mz set_nbr 
#1 29495.45 -0.50902297  1 
#2 39297.27 -0.22218980  2 
#3 29495.45 -0.00999671  3