將密度線添加到ggplot2中的計數數據的直方圖

我希望將密度線（實際上是一個正常密度）添加到直方圖。將密度線添加到ggplot2中的計數數據的直方圖

假設我有以下數據。我可以ggplot2繪製直方圖：

set.seed(123)  
df <- data.frame(x = rbeta(10000, shape1 = 2, shape2 = 4)) 

ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", 
             binwidth = 0.01)

enter image description here

我可以使用添加密度線：

ggplot(df, aes(x = x)) + 
    geom_histogram(aes(y = ..density..),colour = "black", fill = "white", 
       binwidth = 0.01) + 
    stat_function(fun = dnorm, args = list(mean = mean(df$x), sd = sd(df$x)))

enter image description here

但是這不是我真正想要的，我希望此密度線適合計數數據。

我發現了一個類似的帖子（HERE），提供瞭解決此問題的解決方案。但對我來說這不起作用。我需要一個任意的擴展因子來獲得我想要的。這是不是一般化都：

ef <- 100 # Expansion factor 

ggplot(df, aes(x = x)) + 
    geom_histogram(colour = "black", fill = "white", binwidth = 0.01) + 
    stat_function(fun = function(x, mean, sd, n){ 
    n * dnorm(x = x, mean = mean, sd = sd)}, 
    args = list(mean = mean(df$x), sd = sd(df$x), n = ef))

enter image description here

，我可以用它來概括這個

第一正態分佈任何線索，
然後到其它任意塊大小，
最後對任何其他發行將是非常有益的。

來源

2014-12-26 HBat

你的意思是兩個圖表圖像是一樣的嗎？看起來像兩次上傳相同的文件 – arvi1000

在'MASS'包中使用'fitdistr（...）'來適應發行版。 – jlhoward

擬合分佈函數不會發生魔法。你必須明確地做。一種方法是在MASS包中使用fitdistr(...)。

library(MASS) # for fitsidtr(...) 
# excellent fit (of course...) 
ggplot(df, aes(x = x)) + 
    geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ 
    stat_function(fun=dbeta,args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)

# horrible fit - no surprise here 
ggplot(df, aes(x = x)) + 
    geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ 
    stat_function(fun=dnorm,args=fitdistr(df$x,"normal")$estimate)

# mediocre fit - also not surprising... 
ggplot(df, aes(x = x)) + 
    geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ 
    stat_function(fun=dgamma,args=fitdistr(df$x,"gamma")$estimate)

編輯：迴應OP的評論。

比例因子是binwidth＆＃x2715;樣本大小。

ggplot(df, aes(x = x)) + 
    geom_histogram(colour = "black", fill = "white", binwidth = 0.01)+ 
    stat_function(fun=function(x,shape1,shape2)0.01*nrow(df)*dbeta(x,shape1,shape2), 
       args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)

來源

2014-12-26 21:56:45 jlhoward

感謝您對不同分佈的概括。我的最終目標是使這些線條適合計數數據而不是密度。你有什麼見解如何做到這一點？（我想獲得與原始帖子的第三張圖相同的圖） – HBat

請參閱上面的編輯。 – jlhoward

公式中的'0.01'值（'0.01 * nrow（df）* dbeta（x，shape1，shape2）'）不適用於不同的binwidth或樣本大小。假設我有一個樣本大小2474（而不是10000）和0.03（而不是0.01）。我相信0.01應該是箱寬和可能的樣本大小的函數。 – HBat

將密度線添加到ggplot2中的計數數據的直方圖

回答

相關問題