2017-05-25 20 views
0

我有一個變量(day2),我想在ggplot的直方圖頂部疊加一個正態分佈。當我嘗試使用下面的代碼執行此操作時,我收到兩條警告消息並且沒有正常分發。使用dnorm繪製正常曲線不適用於R中的ggplot(因爲偏斜或缺少數據?)

Warning messages: 
1: Removed 546 rows containing non-finite values (stat_bin). 
2: Removed 1 rows containing missing values (geom_bar). 

我想,也許這與分佈的或可能與大量丟失的數據(67%),偏度做的,但我不明白,爲什麼R將不積正常曲線。任何人可以告訴我更多關於R沒有繪製正常曲線的原因嗎?

#Data 
day2 <- c(1.35,1.41,NA,NA,0.08,NA,NA,NA,NA,0.44,NA,0.2,NA,1.64,0.02,NA,NA,2.05,NA,NA,0.7,NA,NA,0.85,NA,NA,NA,NA,NA,0.38,0.11,NA,NA,NA,0.82,NA,0.91,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.38,NA,NA,NA,NA,NA,0.32,0.23,NA,NA,NA,NA,0.14,NA,NA,NA,1.9,NA,NA,0.76,NA,0.7,0.55,NA,0.38,NA,NA,NA,NA,1.18,0.79,NA,NA,NA,NA,NA,2.08,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.14,0.58,NA,1.7,NA,NA,1.06,NA,NA,NA,NA,NA,NA,1.58,NA,NA,NA,NA,NA,NA,NA,NA,2.08,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1.38,1.44,1.73,NA,NA,NA,1.11,1.14,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2.12,NA,NA,NA,NA,NA,NA,NA,NA,NA,1.97,0.58,0.7,NA,NA,NA,NA,NA,NA,NA,1.35,NA,NA,NA,0.29,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.85,1.02,NA,NA,0.05,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.78,NA,NA,NA,NA,NA,NA,2.29,NA,NA,NA,NA,NA,0.23,0.44,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.47,NA,NA,NA,NA,NA,NA,NA,1.17,NA,NA,0.44,NA,0.47,NA,NA,NA,0.17,NA,0.85,NA,NA,NA,NA,1.11,NA,NA,NA,NA,NA,NA,NA,0.41,0.76,NA,NA,NA,NA,0.55,1.02,NA,NA,NA,NA,NA,NA,2.5,NA,0.32,NA,0.17,0.2,0.52,NA,0.23,NA,0.52,NA,0.84,0.26,0.76,0.85,1.52,NA,NA,NA,NA,NA,NA,2.53,NA,NA,0.52,3.35,NA,NA,NA,NA,NA,1.08,NA,1.55,1.97,NA,NA,NA,1.38,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.97,NA,NA,0.94,0.11,NA,0.82,NA,NA,NA,0.5,NA,0.58,NA,0.14,NA,1.17,0.44,0.58,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.82,NA,NA,0.76,1.14,0.17,0.9,NA,0.67,0.38,NA,NA,NA,NA,NA,NA,NA,NA,0.35,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.2,1.44,0.91,2.44,NA,NA,0.23,0.35,0.79,0.76,0.26,NA,0.73,0.79,NA,NA,NA,NA,1.11,NA,2.38,0.06,2.41,0.85,0.58,0.23,NA,NA,NA,NA,NA,0.32,NA,0.29,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.41,NA,NA,NA,NA,NA,0.14,NA,1.2,0.45,NA,NA,NA,NA,NA,NA,0.14,NA,1.88,0.91,1.79,NA,NA,3,NA,1.21,1.7,0.35,NA,1.5,NA,NA,NA,NA,NA,3.21,1.38,2.5,NA,NA,NA,NA,NA,0.7,NA,NA,NA,NA,0.7,NA,NA,0.79,NA,NA,NA,NA,NA,NA,NA,NA,0.28,NA,NA,0.41,0.64,0.85,NA,NA,0.76,NA,NA,0.91,NA,NA,2.2,2.23,NA,NA,1.05,1.29,NA,NA,0.26,1.11,0.35,NA,NA,0.2,NA,NA,0.52,0.23,1.76,1.17,NA,NA,1.2,NA,NA,0.23,NA,0.64,NA,1.94,NA,NA,1,NA,NA,NA,NA,NA,0.73,1.58,0.55,NA,0.84,NA,0.52,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.67,NA,NA,NA,NA,NA,NA,NA,0.76,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1.64,1.75,1.08,0.91,0.94,NA,0.32,2.44,0.17,0.02,1.54,0.5,0.48,1.35,2.61,2.05,NA,0.76,0.08,2.91,NA,1,NA,0.47,0.7,NA,1.45,0.14,NA,0.38,NA,NA,0.26,2.32,0.2,NA,2.72,NA,0.41,NA,0.88,NA,0.85,0.23,NA,NA,NA,NA,NA,NA,1.23,NA,0.2,NA,1.32,2.7,NA,NA,2.55,NA,0.17,NA,NA,NA,NA,1.13,NA,0.79,NA,NA,0.38,NA,NA,1,0.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.47,NA,NA,0.55,0.94,1.02,NA,NA,NA,NA,0.64,0.67,1.87,NA,NA,0.82,NA,NA,NA,NA,NA,NA,0.64,NA,NA,NA,NA,1.7,NA,0.79,NA,0.58,0.11,NA,2.42,NA,NA,NA,NA,NA,NA,NA,NA,NA,0,0.23,NA,NA,0.85,1.14,NA,1.14,NA,NA,0.26,NA,NA,NA,NA,NA,0.14,NA,1.14,1.02,NA,0.94,0.55,NA,1.11,NA,NA,NA,NA,NA,NA,0.7,NA,NA,NA,1.94,NA,NA,NA,0.2,3.44,1,0.91,NA,1.58,2.85,NA,0.79,NA,0.76,0.56,1.78,NA,0.23,1.35,1.82,NA,0.17,1.7,NA,1.32,0.14,0.94,1.52,NA,NA,NA,1.41,0.32,0.58,0.44,NA,0.94,1.44,NA,NA,NA) 

day2 <- data.frame(day2) 


# GGplot script 
day2_hist <- ggplot(day2, aes(day2)) 
day2_hist + geom_histogram(aes(y =..density..), colour = "black", fill = "grey", bins = 20) + #specify geometric object, in this case a histogram  
labs(title = "Hygiene score Day 2") + #add main title          
scale_x_continuous("Hygiene Score", 
    limits = c(0, 4)) + #specify x-axis name 
scale_y_continuous("Density", #specify y-axis name 
    limits = c(0, 0.8)) + #specify limits 
theme(plot.title = element_text(hjust = 0.5), #align main title 
    axis.line = element_line(colour = "black", size = 0.2), #colour of the lines that contain elements 
    panel.grid.major = element_blank(), #colour of major grid lines 
    panel.grid.minor = element_blank(), #colour of minor grid lines 
    panel.border = element_blank(), #colour of graph border 
    panel.background = element_blank()) + #colour of background 
stat_function(fun = dnorm, #returns the probability (i.e., the density) for a given value from a normal distribution of known mean and standard deviation 
    args = list(mean = mean(day2$day2), sd = sd(day2$day2)), 
    colour = "black", size = 0.5) 

回答

2

牛逼認爲問題是day2包含很多NA,則應該通過添加讓R這個na.rm = TRUEmeansd

+ stat_function(
    fun = dnorm, 
    args = list(
     mean = mean(day2$day2, na.rm = TRUE), 
     sd = sd(day2$day2, na.rm = TRUE)), 
    colour = "black", size = 0.5) 

此外,直方圖doen't看起來像一個正常分佈我想。

+0

這確實解決了這個問題。謝謝!我以前在平均值和sd函數(args = list(mean = mean(day2 $ day2),sd = sd(day2 $ day2),na.rm = T)中嘗試了na.rm參數,但是從你的解決方案我知道應該爲mean和sd – SHW

+0

@SHW指定這些術語,'dnorm'沒有'na.rm ='參數,當'mean'或'sd'時它會默默給出'NA' param是'NA',所以我們已經告訴'sd'和'mean'去除'NA'。 – mt1022