2014-04-04 159 views
3

我一直在試圖通過查看其他帖子來執行此操作,但我一直收到錯誤消息。我的數據new看起來是這樣的:按組選擇最大行值

id year name gdp 
1 1980 Jamie 45 
1 1981 Jamie 60 
1 1982 Jamie 70 
2 1990 Kate 40 
2 1991 Kate 25 
2 1992 Kate 67 
3 1994 Joe  35 
3 1995 Joe  78 
3 1996 Joe  90 

我想選擇與ID最高的年份值的行。所以想要的輸出是:

id year name gdp 
1 1982 Jamie 70 
2 1992 Kate 67 
3 1996 Joe  90 

Selecting Rows which contain daily max value in R我嘗試以下,但沒有奏效

ddply(new,~id,function(x){x[which.max(new$year),]}) 

我也試過

tapply(new$year, new$id, max) 

但這並沒有給我想要的輸出。

任何建議將真的幫助!

回答

2

只需使用split

df <- do.call(rbind, lapply(split(df, df$id), 
    function(subdf) subdf[which.max(subdf$year)[1], ])) 

例如,

df <- data.frame(id = rep(1:10, each = 3), year = round(runif(30,0,10)) + 1980, gdp = round(runif(30, 40, 70))) 
print(head(df)) 
# id year gdp 
# 1 1 1990 49 
# 2 1 1981 47 
# 3 1 1987 69 
# 4 2 1985 57 
# 5 2 1989 41 
# 6 2 1988 54 

df <- do.call(rbind, lapply(split(df, df$id), function(subdf) subdf[which.max(subdf$year)[1], ])) 
print(head(df)) 
# id year gdp 
# 1 1 1990 49 
# 2 2 1989 41 
# 3 3 1989 55 
# 4 4 1988 62 
# 5 5 1989 48 
# 6 6 1990 41 
+1

說實話,這似乎過於複雜這個任務。你基本上用'split' +'lapply'來重新創建'by'' – thelatemail

1

你ddply工作對我來說很好,但你提到的回調函數的原始數據集。

ddply(new,~id,function(x){x[which.max(new$year),]}) 
# should be 
ddply(new,.(id),function(x){x[which.max(x$year),]}) 
+2

似乎應該選擇這個答案。 –

2

您可以duplicated

# your data 
df <- read.table(text="id year name gdp 
1 1980 Jamie 45 
1 1981 Jamie 60 
1 1982 Jamie 70 
2 1990 Kate 40 
2 1991 Kate 25 
2 1992 Kate 67 
3 1994 Joe  35 
3 1995 Joe  78 
3 1996 Joe  90" , header=TRUE) 

# Sort by id and year (latest year is last for each id) 
df <- df[order(df$id , df$year), ] 

# Select the last row by id 
df <- df[!duplicated(df$id, fromLast=TRUE), ] 
3

做到這一點對於大表很好地擴展使用data.table另一種選擇。

DT <- read.table(text = "id year name gdp 
          1 1980 Jamie 45 
          1 1981 Jamie 60 
          1 1982 Jamie 70 
          2 1990 Kate 40 
          2 1991 Kate 25 
          2 1992 Kate 67 
          3 1994 Joe  35 
          3 1995 Joe  78 
          3 1996 Joe  90", 
       header = TRUE) 

require("data.table") 
DT <- as.data.table(DT) 

setkey(DT,id,year) 
res = DT[,j=list(year=year[which.max(gdp)]),by=id] 
res 

setkey(res,id,year) 
DT[res] 
# id year name gdp 
# 1: 1 1982 Jamie 70 
# 2: 2 1992 Kate 67 
# 3: 3 1996 Joe 90 
3

ave作品在這裏再次和將佔與最大一年多行的情況。

new[with(new, year == ave(year,id,FUN=max)),] 

# id year name gdp 
#3 1 1982 Jamie 70 
#6 2 1992 Kate 67 
#9 3 1996 Joe 90