2016-09-14 59 views
1

我需要建立的加權平均中R.加權平均值與ddply是錯誤的(R,ddply)

塌陷行時數據

收起由品牌和名稱
name = c("car1", "car2", "car2", "car2", "car3", "car1") 
brand = c("b1", "b2", "b2", "b2", "b3", "b1") 
production = c(10, 10, 30, 40, 10, 5) 
fuelEconomy= c(1, 2, 3, 5, 2, 4) 
size = c(10, 50, 30,40,20, 7) 
adf = data.frame(brand, name, production, fuelEconomy, size) 

adfSum <- ddply(adf, .(brand, name), 
       summarise, 
       fuelEconomySum = sum(fuelEconomy*production)/sum(production), 
       productionSum = sum(production), 
sizeSum = (sum(size*production)/sum(production))) 

結果: 第一個加權平均值(fuelEconomySum)是正確的,但最後一個sizeSum是不正確的。正確的值在括號中。

brand name fuelEconomySum production sizeSum 
b1 car1 2.000 15 17 (9) 
b2 car2 3.875 80 120 (37.5) 
b3 car3 2.000 10 20 (20) 

我正在尋找一種解決方案來同時創建多個加權平均值。

感謝

回答

0

這工作(使用dplyrmagrittr):

name = c("car1", "car2", "car2", "car2", "car3", "car1") 
brand = c("b1", "b2", "b2", "b2", "b3", "b1") 
production = c(10, 10, 30, 40, 10, 5) 
fuelEconomy= c(1, 2, 3, 5, 2, 4) 
size = c(10, 50, 30,40,20, 7) 
adf = data.frame(brand, name, production, fuelEconomy, size) 

library(magrittr) 
library(dplyr) 

afdSum <- adf %>% 
    group_by(brand, name) %>% 
    summarise(fuelEconomySum = sum(fuelEconomy*production)/sum(production), 
      productionSum = sum(production), 
      sizeSum = sum(size*production)/sum(production)) %>% 
    as.data.frame() 


> afdSum 
    brand name fuelEconomySum productionSum sizeSum 
    1 b1 car1   2.000   15  9.0 
    2 b2 car2   3.875   80 37.5 
    3 b3 car3   2.000   10 20.0 

編輯:您的解決方案,順便說一下,工作正常,我。

> devtools::session_info("plyr") 
Session info  --------------------------------------------------------------------------- 
setting value      
version R version 3.3.1 (2016-06-21) 
system x86_64, linux-gnu   
ui  RStudio (0.99.491)   
language en_US      
collate en_US.UTF-8     
tz  <NA>       
date  2016-09-14     

Packages  ------------------------------------------------------------------------------- 
package * version date  source   
plyr * 1.8.3 2015-06-12 CRAN (R 3.3.0) 
Rcpp  0.12.5 2016-05-14 CRAN (R 3.3.0) 
+0

感謝您的貢獻。 我發現了錯誤。這是在我的變量的命名。我將變量名稱更改爲productionSum,以便在本文中明確說明。但在我的腳本中,我只是把它命名爲production,這與我的輸入相同。這導致了這樣一個事實,即最後的操作已經把生產的總和而不是單個的價值。 –