2015-08-14 55 views
2

我有以下數據框。我想計算每個星期的加權平均數。計算到目前爲止的平均數r

現有的數據幀:

> df 
    week Avg_price Num_items 
    1  100  10 
    2  120   8 
    3  90   5 
    4  110  20 

所需數據幀:

> df 
    week Avg_price Num_items Avg_price_toDate 
    1  100  10    100 
    2  120   8    108.8 
    3  90   5    104.78 
    4  110  20    107.21 

我已經找到了如何做到這一點使用基本的for循環計算項目的累計數量,以日期和之前的Average_price_toDate。我想知道在R中是否有更好的方法來實現它,因爲我希望能夠根據不同的產品分組來劃分數據幀。

+0

見'weighted.mean'? – Benjamin

回答

5

是的,您也可以使用cumsum來計算滾動加權平均數。

transform(df,Avg_price_toDate=cumsum(Avg_price*Num_items)/cumsum(Num_items)) 
 
    week Avg_price Num_items Avg_price_toDate 
1 1  100  10   100.0000 
2 2  120   8   108.8889 
3 3  90   5   104.7826 
4 4  110  20   107.2093 
1

這裏是data.table可以處理類別的更通用的解決方案。

library(data.table) 
dt <- data.table(category = c(rep("a", 4), rep("b", 4)), 
       week = c(1, 2, 3, 4, 
          1, 2, 3, 4), 
       Avg_price = c(100, 120, 90, 110, 
           150, 200, 250, 300), 
       Num_items = c(10, 8, 5, 20, 
           20, 30, 40, 50)) 
(dt[, wtd:=cumsum(Avg_price*Num_items)/cumsum(Num_items), 
     by = "category"]) 

其給出這樣的:

category week Avg_price Num_items  wtd 
1:  a 1  100  10 100.0000 
2:  a 2  120   8 108.8889 
3:  a 3  90   5 104.7826 
4:  a 4  110  20 107.2093 
5:  b 1  150  20 150.0000 
6:  b 2  200  30 180.0000 
7:  b 3  250  40 211.1111 
8:  b 4  300  50 242.8571 
0

使用dplyr

library(dplyr) 
df %>% mutate(Avg_price_toDate = cumsum(Avg_price*Num_items)/cumsum(Num_items)) 

使用sqldf

library(sqldf) 
sqldf('SELECT a.*, SUM(b.Avg_price*b.Num_items*1.0)/SUM(b.Num_items) AS Avg_price_toDate 
     FROM df AS a, df AS b WHERE b.week <= a.week 
     GROUP BY a.week') 

輸出:

week Avg_price Num_items Avg_price_toDate 
1 1  100  10   100.0000 
2 2  120   8   108.8889 
3 3  90   5   104.7826 
4 4  110  20   107.2093 

數據:

df <- structure(list(week = 1:4, Avg_price = c(100L, 120L, 90L, 110L 
), Num_items = c(10L, 8L, 5L, 20L)), .Names = c("week", "Avg_price", 
"Num_items"), class = "data.frame", row.names = c(NA, -4L)) 
0

結合每個人的答案,這裏是我的執行計算平均日期由不同的類別進行分組。

數據幀:

week Avg_price Num_items type_item 
1 1  100  10   1 
2 2  120   8   1 
3 3  90   5   2 
4 4  110  20   2 

使用dplyr

df %>% 
    group_by(type_item) %>% 
    mutate(avg.price.by.type = cumsum(Avg_price * Num_items)/cumsum(Num_items)) 

輸出:

week Avg_price Num_items type_item avg.price.by.type 
1 1  100  10   1   100.0000 
2 2  120   8   1   108.8889 
3 3  90   5   2   90.0000 
4 4  110  20   2   106.0000