2013-10-25 44 views
6

我有一個看起來像一個數據幀:和值從普通ID在數據幀

df<-data.frame(id=c("xx33","xx33","xx22","xx11","xx11","xx00"),amount=c(10,15,100,20,10,15),date=c("01/02/2013","01/02/2013","02/02/2013","03/03/2013","03/03/2013","04/04/2013")) 

    id amount date 
1 xx33 10 01/02/2013 
2 xx33 15 01/02/2013 
3 xx22 100 02/02/2013 
4 xx11 20 03/03/2013 
5 xx11 10 03/03/2013 
6 xx00 15 04/04/2013 

我想編譯所有常見的ID和總結的量,也是ID的出現的次數,但還攜帶諸如每個id相同的日期(以及任何其他變量)的通用信息。所以,我想輸出是:

id sum date  number 
1 xx33 25 01/02/2013 2 
2 xx22 100 02/02/2013 1 
3 xx11 30 03/03/2013 2 
4 xx00 15 04/04/2013 1 

我已經試過

ddply(.data = df, .var = "id", .fun = nrow) 

,並返回occurances的總數,但我不能想出一個辦法,總結了所有常見的沒有循環的ID。

回答

6

下面是使用plyr包裝中的溶液:

library(plyr) 
ddply(df,.(date,id),summarize,sum=sum(amount),number=length(id)) 
      date id sum number 
    1 01/02/2013 xx33 25  2 
    2 02/02/2013 xx22 100  1 
    3 03/03/2013 xx11 30  2 
    4 04/04/2013 xx00 15  1 
5

使用data.table庫 -

library(data.table) 
dt <- data.table(df) 
dt2 <- dt[,list(sumamount = sum(amount), freq = .N), by = c("id","date")] 

輸出:

> dt2 
    id  date sumamount freq 
1: xx33 01/02/2013  25 2 
2: xx22 02/02/2013  100 1 
3: xx11 03/03/2013  30 2 
4: xx00 04/04/2013  15 1 
4

下面是一個R基本溶液

> cbind(aggregate(amount~id+date, sum, data=df), table(df$id))[, -4] 
    id  date amount Freq 
1 xx33 01/02/2013  25 1 
2 xx22 02/02/2013 100 2 
3 xx11 03/03/2013  30 1 
4 xx00 04/04/2013  15 2 
3

強制性回答R:

unique(transform(df, amount=ave(amount, id, FUN=sum), 
        count=ave(amount, id, FUN=length))) 
#  id amount  date count 
# 1 xx33  25 01/02/2013  2 
# 3 xx22 100 02/02/2013  1 
# 4 xx11  30 03/03/2013  2 
# 6 xx00  15 04/04/2013  1