2012-08-26 120 views
12

,我有以下的數據幀:查找最大日期爲每個ID

id<-c(1,1,2,3,3) 
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08") 
df<-data.frame(id,date) 
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y") 


id  date  date2 
1 23-01-08 2008-01-23 
1 01-11-07 2007-11-01 
2 30-11-07 2007-11-30 
3 17-12-07 2007-12-17 
3 12-12-08 2008-12-12 

現在我需要創建一個第四列並插入,每個id交易的最大日期。 最終的表應該是如下:

id  date  date2  max 
1 23-01-08 2008-01-23 2008-01-23 
1 01-11-07 2007-11-01 0 
2 30-11-07 2007-11-30 2007-11-30 
3 17-12-07 2007-12-17 0 
3 12-12-08 2008-12-12 2008-12-12 

我會感激,如果你能幫助我。

回答

18
id<-c(1,1,2,3,3) 
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08") 
df<-data.frame(id,date) 
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y") 
# aggregate can be used for this type of thing 
d = aggregate(df$date2,by=list(df$id),max) 
# And merge the result of aggregate 
# with the original data frame 
df2 = merge(df,d,by.x=1,by.y=1) 
df2 

    id  date  date2   x 
1 1 23-01-08 2008-01-23 2008-01-23 
2 1 01-11-07 2007-11-01 2008-01-23 
3 2 30-11-07 2007-11-30 2007-11-30 
4 3 17-12-07 2007-12-17 2008-12-12 
5 3 12-12-08 2008-12-12 2008-12-12 

編輯:由於當日期與最大日期不符時,您希望最後一列爲「空」,您可以嘗試下一行。

df2[df2[,3]!=df2[,4],4]=NA 

df2 
    id  date  date2   x 
1 1 23-01-08 2008-01-23 2008-01-23 
2 1 01-11-07 2007-11-01  <NA> 
3 2 30-11-07 2007-11-30 2007-11-30 
4 3 17-12-07 2007-12-17  <NA> 
5 3 12-12-08 2008-12-12 2008-12-12 

當然,總是清理colnames等,但我留給你。

2
library(sqldf) 
tables<- '(SELECT * FROM df 
      ) 
      AS t1, 
      (SELECT id,max(date2) date2 FROM df GROUP BY id 
      ) 
      AS t2' 

out<-fn$sqldf("SELECT t1.*,t2.date2 mdate FROM $tables WHERE t1.id=t2.id") 
out$mdate<-as.Date(out$mdate) 
out$mdate[out$date2!=out$mdate]<-NA 
# id  date  date2  mdate 
#1 1 01-11-07 2007-11-01  <NA> 
#2 1 23-01-08 2008-01-23 2008-01-23 
#3 2 30-11-07 2007-11-30 2007-11-30 
#4 3 12-12-08 2008-12-12 2008-12-12 
#5 3 17-12-07 2007-12-17  <NA> 
1

不能使用0作爲一個日期值,所以你要麼需要放棄保持它作爲一個日期或接受NA值:

# Date values: 
df$maxdt <- ave(df$date2, df$id, 
        FUN=function(x) ifelse(x == max(x), as.character(x), NA)) 
str(ave(df$date2, df$id, FUN=function(x) ifelse(x == max(x), as.character(x), NA))) 
# Date[1:5], format: "2008-01-23" NA "2007-11-30" NA "2008-12-12" 

ifelse機器做一些奇怪的類型檢查作爲上面的第二個參數使用僅僅x,但仍然返回Date類向量。去搞清楚!以下是字符矢量選項。

# Character values: 
df$maxdt <- ave(as.character(df$date2), df$id, 
        FUN=function(x) ifelse(x == max(x), x, "0")) 
ave(as.character(df$date2), df$id, FUN=function(x) ifelse(x == max(x), x, "0")) 
[1] "2008-01-23" "0"   "2007-11-30" "0"   "2008-12-12" 
7

另一種方法是使用plyr包:

library(plyr) 
ddply(df, "id", summarize, max = max(date2)) 

# id  max 
#1 1 2008-01-23 
#2 2 2007-11-30 
#3 3 2008-12-12 

現在,這是不是在你之後的格式,因爲那隻能說明對方id一次。別擔心,我們可以使用transform,而不是summarize

ddply(df, "id", transform, max = max(date2)) 

# id  date  date2  max 
#1 1 01-11-07 2007-11-01 2008-01-23 
#2 1 23-01-08 2008-01-23 2008-01-23 
#3 2 30-11-07 2007-11-30 2007-11-30 
#4 3 12-12-08 2008-12-12 2008-12-12 
#5 3 17-12-07 2007-12-17 2008-12-12 

正如@ seandavi的答案,這種重複max日期爲每個id。如果你想重複的改變NA,這樣的事情會做的工作:

within(ddply(df, "id", transform, max = max(date2)), max[max != date2] <- NA) 
2

加入,以防有人dplyr解決方案正在尋找:

library(dplyr) 

df %>% 
    group_by(id) %>% 
    mutate(max = if_else(date2 == max(date2), date2, as.Date(NA))) 

結果:

# A tibble: 5 x 4 
# Groups: id [3] 
    id  date  date2  max 
    <dbl> <fctr>  <date>  <date> 
1  1 23-01-08 2008-01-23 2008-01-23 
2  1 01-11-07 2007-11-01   NA 
3  2 30-11-07 2007-11-30 2007-11-30 
4  3 17-12-07 2007-12-17   NA 
5  3 12-12-08 2008-12-12 2008-12-12 
+0

我以這種方式使用它:mutate(flag_last = if_else(date == max(date),TRUE,FALSE))%>%filter(flag_last == TRUE) – Rohit