2014-09-01 67 views
2

這是我前一個問題Integrating ggplot2 with user-defined stat_function()的後續,我昨天回答了我自己。我的當前問題是,在以下的再現的例子中,線,這是爲了繪製部件中的數據值混合物分配,既不出現在預期的地方,也沒有他們期望的形狀,如如下圖所示(見第二張圖中y = 0處的紅線)。ggplot2數據和刻度的日誌轉換

enter image description here

enter image description here

完全重複的例子,

library(ggplot2) 
library(scales) 
library(RColorBrewer) 
library(mixtools) 

NUM_COMPONENTS <- 2 

set.seed(12345) # for reproducibility 

data(diamonds, package='ggplot2') # use built-in data 
myData <- diamonds$price 

# extract 'k' components from mixed distribution 'data' 
mix.info <- normalmixEM(myData, k = NUM_COMPONENTS, 
         maxit = 100, epsilon = 0.01) 
summary(mix.info) 

numComponents <- length(mix.info$sigma) 
message("Extracted number of component distributions: ", 
     numComponents) 

calc.components <- function(x, mix, comp.number) { 

    mix$lambda[comp.number] * 
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number]) 
} 

g <- ggplot(data.frame(x = myData)) + 
    scale_fill_continuous("Count", low="#56B1F7", high="#132B43") + 
    scale_x_log10("Diamond Price [log10]", 
       breaks = trans_breaks("log10", function(x) 10^x), 
       labels = prettyNum) + 
    scale_y_continuous("Count") + 
    geom_histogram(aes(x = myData, fill = 0.01 * ..density..), 
       binwidth = 0.01) 
print(g) 

# we could select needed number of colors randomly: 
#DISTRIB_COLORS <- sample(colors(), numComponents) 

# or, better, use a palette with more color differentiation: 
DISTRIB_COLORS <- brewer.pal(numComponents, "Set1") 

distComps <- lapply(seq(numComponents), function(i) 
    stat_function(fun = calc.components, 
       arg = list(mix = mix.info, comp.number = i), 
       geom = "line", # use alpha=.5 for "polygon" 
       size = 1, 
       color = "red")) # DISTRIB_COLORS[i] 
print(g + distComps) 

UPDATE:只要我努力的一個快速的注意。我另外嘗試了幾個其他選項,包括將圖的x軸縮放比例轉換爲正常,並在直方圖部分請求原始數據值'日誌轉換,如下所示:geom_histogram(aes(x = log10(data), fill = ..count..), binwidth = 0.01),但最終結果仍保持不變。關於我的第一條評論,我意識到,只要我使用對..count ..對象的引用,就不需要我提到的轉換。

UPDATE 2:將由stat_function()生成的行的顏色更改爲紅色,以澄清問題。

+0

剛剛意識到,對於這句和前面的問題,我可能需要** **乘元件分銷*數據值*至*總爲了從*密度分佈*移動到*計數分佈*,每個組件分佈中元素的數量*(在我們的例子中它們是相等的)。如果它是有道理的,那麼我應該怎麼做,使用'stat_function()'?我想,通過將一個乘數作爲相應的參數添加到'calc.components'函數和'stat_function'的'arg'列表的相應參數。 – 2014-09-01 05:41:40

+2

我低估了這個問題,因爲它太冗長,而且大膽的面孔降低了可讀性。請提出更多問題。另外,您承認我們想要最小程度的重現性示例。請嘗試創建一個。 – Roland 2014-09-01 07:31:21

+1

@Roland:只要證明有效,我就沒有問題了,就像你剛纔那樣。對大膽的字體抱歉 - 我試圖強調重要的元素/點。將限制其在未來的使用,並將嘗試提供更緊湊的問題。關於可重複的例子,我剛剛創建了一個,並很快更新我的問題。感謝您的幫助! – 2014-09-01 07:38:25

回答

3

最後,我已經找到了問題,刪除了我之前的答案,並且我在下面提供了我的最新解決方案(我沒有解決的唯一問題是組件的圖例面板 - 它並未出於某種原因,但對於EDA來證明混合分佈的存在我認爲它已經足夠好了)。下面是完整的可重複的解決方案。感謝那些直接或間接幫助過我們的人。

library(ggplot2) 
library(scales) 
library(RColorBrewer) 
library(mixtools) 

NUM_COMPONENTS <- 2 

set.seed(12345) # for reproducibility 

data(diamonds, package='ggplot2') # use built-in data 
myData <- diamonds$price 


calc.components <- function(x, mix, comp.number) { 

    mix$lambda[comp.number] * 
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number]) 
} 


overlayHistDensity <- function(data, calc.comp.fun) { 

    # extract 'k' components from mixed distribution 'data' 
    mix.info <- normalmixEM(data, k = NUM_COMPONENTS, 
          maxit = 100, epsilon = 0.01) 
    summary(mix.info) 

    numComponents <- length(mix.info$sigma) 
    message("Extracted number of component distributions: ", 
      numComponents) 

    DISTRIB_COLORS <- 
    suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1")) 

    # create (plot) histogram and ... 
    g <- ggplot(as.data.frame(data), aes(x = data)) + 
    geom_histogram(aes(y = ..density..), 
        binwidth = 0.01, alpha = 0.5) + 
    theme(legend.position = 'top', legend.direction = 'horizontal') 

    comp.labels <- lapply(seq(numComponents), 
         function (i) paste("Component", i)) 

    # ... fitted densities of components 
    distComps <- lapply(seq(numComponents), function (i) 
    stat_function(fun = calc.comp.fun, 
        args = list(mix = mix.info, comp.number = i), 
        size = 2, color = DISTRIB_COLORS[i])) 

    legend <- list(scale_colour_manual(name = "Legend:", 
            values = DISTRIB_COLORS, 
            labels = unlist(comp.labels))) 

    return (g + distComps + legend) 
} 

overlayPlot <- overlayHistDensity(log10(myData), 'calc.components') 
print(overlayPlot) 

結果:

enter image description here