2013-04-24 45 views
1

經過相當長的時間尋找解決方案和擺弄之後,我試圖在boxplot上顯示加權平均數(我以爲我已將此查詢提交給ggplot2郵件列表,但那是4個多小時前的事了,我的問題還沒有出現,所以擔心我在我的帖子中犯了一個錯誤,我在這裏發帖 - 因爲我的問題非常緊迫)。如何在箱形圖上繪製加權平均值

我在下面提供一個玩具的例子。

#data 

value <- c(5, 7, 8, 6, 7, 9, 10, 6, 7, 10) 
category <- c("one", "one", "one", "two", "two", "two", 
       "three", "three", "three","three") 
weight <- c(1, 1.2, 2, 3, 2.2, 2.5, 1.8, 1.9, 2.2, 1.5) 
df <- data.frame(value, category, weight) 

#unweighted means by category 
ddply(df, .(category), summarize, mean=round(mean(value, na.rm=TRUE), 2)) 

    category mean 
1  one 6.67 
2 three 8.25 
3  two 7.33 

#weighted means by category 
ddply(df, .(category), summarize, 
      wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2)) 

    category wmean 
1  one 7.00 
2 three 8.08 
3  two 7.26 

#unweighted means added to boxplot (which works fine) 
ggplot(df, aes(x = category, y = value, weight = weight)) + 
    geom_boxplot(width=0.6, colour = I("#3366FF")) + 
    stat_summary(fun.y ="mean", geom ="point", shape = 23, 
       size = 3, fill ="white") 

我的問題是,我如何在boxplot上顯示加權平均值而不是未加權平均值?

回答

4

您可以將加權平均值保存爲新的數據框,然後用它來繪製geom_point()。參數inherit.aes=FALSE將確保在繪製點時不會繼承ggplot()調用中提供的信息。

library(Hmisc) 
library(plyr) 
library(ggplot2) 
df.wm<-ddply(df, .(category), summarize, 
      wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2)) 

ggplot(df, aes(x = category, y = value, weight = weight)) + 
    geom_boxplot(width=0.6, colour = I("#3366FF")) + 
    geom_point(data=df.wm,aes(x=category,y=wmean),shape = 23, 
      size = 3, fill ="white",inherit.aes=FALSE) 

enter image description here

+1

那只是醫生囑咐的。非常感謝!這非常有幫助。 – user2317662 2013-04-25 05:57:38

+0

由於某些原因與此代碼我得到了一個錯誤,但在[這個問題](http://stackoverflow.com/questions/3277326/group-by-in-r-ddply-with-weighted-mean)上的代碼工作。 – Tom 2013-09-27 07:47:20

+0

@Tom對我來說,這段代碼仍然沒有任何錯誤。 – 2013-09-27 07:55:44