2014-07-23 155 views
3

包含日期,買入價值和賣出價值的表格。我想要統計每天的購買量和銷售量,以及購買和銷售的總數。我在data.table中發現這有點棘手。R data.table中的分組計數彙總

date buy sell  
2011-01-01 1 0 
2011-01-02 0 0 
2011-01-03 0 2 
2011-01-04 3 0 
2011-01-05 0 0 
2011-01-06 0 0 
2011-01-01 0 0 
2011-01-02 0 1 
2011-01-03 4 0 
2011-01-04 0 0 
2011-01-05 0 0 
2011-01-06 0 0 
2011-01-01 0 0 
2011-01-02 0 8 
2011-01-03 2 0 
2011-01-04 0 0 
2011-01-05 0 0 
2011-01-06 0 5 

以上data.table可以使用下面的代碼來創建:

DT = data.table(
      date=rep(as.Date('2011-01-01')+0:5,3) , 
      buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0), 
      sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5)); 

我想要什麼,結果是:

date total_buys total_sells 
2011-01-01 1   0 
2011-01-02 0   2 
       and so on 

而且我也想知道購買和銷售總數:

total_buys total_sells 
    4   4 

我曾嘗試:

length(DT[sell > 0 | buy > 0]) 
> 3 

這是一個奇怪的答案(想知道爲什麼)

回答

10
## by date 
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date] 
##   date total_buys total_sells 
## 1: 2011-01-01   1   0 
## 2: 2011-01-02   0   2 
## 3: 2011-01-03   2   1 
## 4: 2011-01-04   1   0 
## 5: 2011-01-05   0   0 
## 6: 2011-01-06   0   1 

DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))] 
## total_buys total_sells 
## 1:   4   4 
+0

總和增加了購買價值 - 我期待指望他們。總購買量和總銷售量各有4個。 – user1480926

+0

@ user1480926更新了答案 –

+0

謝謝傑克你介意解釋這是如何工作的?這是一個非常簡潔的方法來做到這一點的榮譽。 – user1480926

3

的替代@傑克的回答是典型的melt + dcast常規,類似:

library(reshape2) 
dtL <- melt(DT, id.vars = "date") 
dcast.data.table(dtL, date ~ variable, value.var = "value", 
       fun.aggregate = function(x) sum(x > 0)) 
#   date buy sell 
# 1 2011-01-01 1 0 
# 2 2011-01-02 0 2 
# 3 2011-01-03 2 1 
# 4 2011-01-04 1 0 
# 5 2011-01-05 0 0 
# 6 2011-01-06 0 1 

,或在不熔化,只是:

DT[, lapply(.SD, function(x) sum(x > 0)), by = date] 

爲了讓您的其他表,嘗試:

dtL[, list(count = sum(value > 0)), by = variable] 
# variable count 
# 1:  buy  4 
# 2:  sell  4 

,或在不熔化:

DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")] 
# buy sell 
# 1: 4 4 
+0

謝謝阿南達,這也很酷! – user1480926

+0

@ user1480926,我認爲我會分享它,因爲如果你的列數多於2,那麼它會變得更加方便。 – A5C1D2H2I1M1N2O1R2T1