2013-07-08 68 views
1

這是我的顧客訂單數據看起來像一個單一的客戶:基於R data.table彙總選擇行

order_no customer_id product amount order_total 
     23   1  A 100   100 
     24   1  A 100   300 
     24   1  B 100   300 
     24   1  C 100   300 
     25   1  B 100   100 
     26   1  A 100   200 
     26   1  B 100   200 

我要計算一個新的列中的每個客戶的平均訂單大小,所以該客戶將是175 =(100 + 300 + 100 + 200)/ 4:

order_no customer_id amount order_total avg_order_size 
     23   1  100   100    175 
     24   1  100   300    175 
     24   1  100   300    175 
     24   1  100   300    175 
     25   1  100   100    175 
     26   1  100   200    175 
     26   1  100   200    175 

我試圖用這個版本的一些,但沒有運氣:

customer_stats <- data.table(customer_stats)[, avg_order_size := mean(order_total), by=list(order_no, customer_id)] 

我真正需要做的是從每個order_no中選擇一行,類似的所有order_no[1]mean或許?如果有一種方法可以一步完成,並跳過創建order_total,那就更好了。

+0

你嘗試'customer_stats [,avg_order_size:=平均值(ORDER_TOTAL),由=名單(order_no,CUSTOMER_ID)]'使用':='已經執行的任務,從而無需額外分配你的'數據。表' – dickoa

+0

@dickoa,如果你同時使用'order_no'和'customer_id',那麼你將採用'100,300,100,*'*分別表示*(它們最終會得到相同的值)。 – Arun

+0

@阿倫是的,你說得對 – dickoa

回答

0

可避免這樣做創建order_total

customer_stats[ , avg_order_size := sum(amount, na.rm=TRUE)/length(unique(order_no)), by=customer_id] 

但是,我對如何快速,這將是保留。

+0

我很抱歉,爲什麼你在這裏總結'金額'?他總結了「100 + 300 + 100 + 200」(這是'order_total',而不是'amount')? – Arun

+0

取景效果!他通過'order_no'創建了'order_total'總和'amount'。請參閱編輯他的問題,他詢問是否可以在不計算'order_total'的情況下完成此操作。 – asb

+0

啊哈,我現在看到它。 – Arun

2

這個怎麼樣,它似乎翻譯你的方法,並不需要在這裏計算order_total

dat[, sum(amount), by = list(customer_id, order_no)][ ,avg_order := mean(V1), by = customer_id] 
0

我認爲,關鍵是通過客戶和訂單,以鍵入原始表,由客戶和訂單總結訂單總量,獲得由客戶平均訂單總額,然後加入該回到原來的表。

# Your data (next time, consider putting R-formatted data in the question...): 
dt <- data.table(customer_id=1, 
       order_no=c(23,24,24,24,25,26,26), 
       product=c("A","A","B","C","B","A","B"), 
       product_amount=100, 
       key=c("customer_id","order_no")) # 1: key by customer and order 

dt 
# customer_id order_no product product_amount 
#1:   1  23  A   100 
#2:   1  24  A   100 
#3:   1  24  B   100 
#4:   1  24  C   100 
#5:   1  25  B   100 
#6:   1  26  A   100 
#7:   1  26  B   100 

dt[ # 4: join summary back to original 
    dt[,list(order_total=sum(product_amount)),by=list(customer_id,order_no)] [ # 2: order total by customer and order 
    ,avg_order_size:=mean(order_total),by=list(customer_id)] # 3: add the average of order total by customer 
    ] 
# customer_id order_no product product_amount order_total avg_order_size 
#1:   1  23  A   100   100   175 
#2:   1  24  A   100   300   175 
#3:   1  24  B   100   300   175 
#4:   1  24  C   100   300   175 
#5:   1  25  B   100   100   175 
#6:   1  26  A   100   200   175 
#7:   1  26  B   100   200   175