通過彙總和的indeces R中

重新加權我有一噸由國家，日期和UPC（產品代碼）索引的價格數據。我想彙總UPC，並通過加權平均結合價格。我會盡力解釋它，但您可能只想閱讀下面的代碼。通過彙總和的indeces R中

數據集中的每個觀察是：UPC，日期，狀態，價格和重量。我想離開聚集在這樣的UPC指數：

採取所有的數據點具有相同的日期和狀態，以及它們的權重多的價格，總結起來。這顯然創建了一個加權平均數，我稱之爲priceIndex。但是，對於某個日期的&狀態組合，權重不會累加爲1.因此，我想創建兩個附加列：一個用於每個日期&狀態組合的權重總和。第二個是重新加權平均值：也就是說，如果原來的兩個權重是.5和.3，將它們改爲.5 /（.5 + .3）= .625和.3 /（.5 + .3）= .375，然後將加權平均值重新計算爲另一個價格指數。

這就是我的意思是：

upc=c(1153801013,1153801013,1153801013,1153801013,1153801013,1153801013,2105900750,2105900750,2105900750,2105900750,2105900750,2173300001,2173300001,2173300001,2173300001) 
date=c(200601,200602,200603,200603,200601,200602,200601,200602,200603,200601,200602,200601,200602,200603,200601) 
price=c(26,28,27,27,23,24,85,84,79.5,81,78,24,19,98,47) 
state=c(1,1,1,2,2,2,1,1,2,2,2,1,1,1,2) 
weight=c(.3,.2,.6,.4,.4,.5,.5,.5,.45,.15,.5,.2,.15,.3,.45) 

# This is what I have: 
data <- data.frame(upc,date,state,price,weight) 
data 

# These are a few of the weighted calculations: 
# .3*26+85*.5+24*.2 = 55.1 
# 28*.2+84*.5+19*.15 = 50.45 
# 27*.6+98*.3 = 45.6 
# Etc. etc. 

# Here is the reweighted calculation for date=200602 & state==1: 
# 28*(.2/.85)+84*(.5/.85)+19*(.15/.85) = 50.45 
# Or, equivalently: 
# (28*.2+84*.5+19*.15)/.85 = 50.45 

# This is what I want: 
date=c(200601,200602,200603,200601,200602,200603) 
state=c(1,1,1,2,2,2) 
priceIndex=c(55.1,50.45,45.6,42.5,51,46.575) 
totalWeight=c(1,.85,.9,1,1,.85) 
reweightedIndex=c(55.1,59.35294,50.66667,42.5,51,54.79412) 
index <- data.frame(date,state,priceIndex,totalWeight,reweightedIndex) 
index

而且，不是它應該的問題，但也有35州，150點的UPC，並在數據集84個日期 - 所以有很多意見。

非常感謝。

來源

2016-02-13 ejn

我們可以通過總結操作使用其中的一個組。隨着data.table，我們轉換「data.frame」到「data.table」（setDT(data)，通過「日期」，「國家」，我們得到了分組的「價格」和「重量」，並作爲sum(weight)臨時變量的產品sum ，然後創建在list的3個變量基礎上。

library(data.table) 
setDT(data)[, {tmp1 = sum(price*weight) 
       tmp2 = sum(weight) 
     list(priceIndex=tmp1, totalWeight=tmp2, 
       reweigthedIndex = tmp1/tmp2)}, .(date, state)] 
# date state priceIndex totalWeight reweightedIndex 
#1: 200601  1  55.100  1.00  55.10000 
#2: 200602  1  50.450  0.85  59.35294 
#3: 200603  1  45.600  0.90  50.66667 
#4: 200603  2  46.575  0.85  54.79412 
#5: 200601  2  42.500  1.00  42.50000 
#6: 200602  2  51.000  1.00  51.00000

或者使用dplyr，我們可以使用summarise做的「日期」和「狀態」分組後創造了3列。

library(dplyr) 
data %>% 
    group_by(date, state) %>% 
    summarise(priceIndex = sum(price*weight), 
      totalWeight = sum(weight), 
      reweightedIndex = priceIndex/totalWeight) 
# date state priceIndex totalWeight reweightedIndex 
# (dbl) (dbl)  (dbl)  (dbl)   (dbl) 
#1 200601  1  55.100  1.00  55.10000 
#2 200601  2  42.500  1.00  42.50000 
#3 200602  1  50.450  0.85  59.35294 
#4 200602  2  51.000  1.00  51.00000 
#5 200603  1  45.600  0.90  50.66667 
#6 200603  2  46.575  0.85  54.79412

來源

2016-02-13 16:41:48 akrun

對於dplyr之一，當我輸入時，我只得到一行？ – ejn

@ejn你可以使用'dplyr :: summarise'（如果你還加載了'plyr' – akrun

通過彙總和的indeces R中

回答

相關問題