2013-10-21 63 views
1

我想查找給定類別中數值的百分比分佈,但是按第二個類別分組。例如,假設我有一個數據框region,line_of_businesssales,我想要找到sales的百分比line_of_business,按region分組。使用ddply按類別找到比例

我可以做到這一點的r內置aggregatemerge功能,但我很好奇,如果有一個較短的方式與plyr'ddply功能避免了merge的顯式調用做到這一點。

+2

你能提供一個例子數據集和你已​​經嘗試過的可重複的例子嗎? – tcash21

回答

1

如何創建交叉表並採取比例?

total_sales <- xtabs(sales~region+line_of_business, data=df) 
prop.table(total_sales, 1) 
+0

謝謝尼爾,這個工程。 – Abiel

1

這裏是一種與plyr做到這一點:

library(plyr) 
library(reshape2) 

# Create fake data 
sales = rnorm(1000,10000,1000) 
line_of_business = sample(c("Sporting Goods", "Computers", "Books"), 
          1000, replace=TRUE) 
region = sample(c("East","West","North","South"), 1000, replace=TRUE) 
dat = data.frame(sales, line_of_business, region) 

# Sales by region by line_of_business 
dat_summary = ddply(dat, .(region, line_of_business), summarise, 
        tot.sales=sum(sales)) 

# Add percentage by line_of_business, within each region 
dat_summary = ddply(dat_summary, .(region), transform, 
        pct=round(tot.sales/sum(tot.sales)*100,2)) 

# Reshape, if desired 
dat_summary_m = melt(dat_summary, id.var=c("region","line_of_business")) 
dat_summary_w = dcast(dat_summary_m, line_of_business ~ region + variable, 
         value.var='value', 
         fun.aggregate=sum) 

下面是最終的結果:

> dat_summary_w 
    line_of_business East_tot.sales East_pct North_tot.sales North_pct South_tot.sales South_pct 
1   Books  852688.3 31.97  736748.4  33.2  895986.6  35.70 
2  Computers  776864.3 29.13  794480.4  35.8  933407.9  37.19 
3 Sporting Goods  1037619.8 38.90  687877.6  31.0  680199.1  27.10 
    West_tot.sales West_pct 
1  707540.9 27.28 
2  951677.9 36.70 
3  933987.7 36.02