ggplot2數據和刻度的日誌轉換

這是我前一個問題Integrating ggplot2 with user-defined stat_function()的後續，我昨天回答了我自己。我的當前問題是，在以下的再現的例子中，線，這是爲了繪製部件中的數據值混合物分配的，既不出現在預期的地方，也沒有他們期望的形狀，如如下圖所示（見第二張圖中y = 0處的紅線）。ggplot2數據和刻度的日誌轉換

enter image description here

完全重複的例子，：

library(ggplot2) 
library(scales) 
library(RColorBrewer) 
library(mixtools) 

NUM_COMPONENTS <- 2 

set.seed(12345) # for reproducibility 

data(diamonds, package='ggplot2') # use built-in data 
myData <- diamonds$price 

# extract 'k' components from mixed distribution 'data' 
mix.info <- normalmixEM(myData, k = NUM_COMPONENTS, 
         maxit = 100, epsilon = 0.01) 
summary(mix.info) 

numComponents <- length(mix.info$sigma) 
message("Extracted number of component distributions: ", 
     numComponents) 

calc.components <- function(x, mix, comp.number) { 

    mix$lambda[comp.number] * 
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number]) 
} 

g <- ggplot(data.frame(x = myData)) + 
    scale_fill_continuous("Count", low="#56B1F7", high="#132B43") + 
    scale_x_log10("Diamond Price [log10]", 
       breaks = trans_breaks("log10", function(x) 10^x), 
       labels = prettyNum) + 
    scale_y_continuous("Count") + 
    geom_histogram(aes(x = myData, fill = 0.01 * ..density..), 
       binwidth = 0.01) 
print(g) 

# we could select needed number of colors randomly: 
#DISTRIB_COLORS <- sample(colors(), numComponents) 

# or, better, use a palette with more color differentiation: 
DISTRIB_COLORS <- brewer.pal(numComponents, "Set1") 

distComps <- lapply(seq(numComponents), function(i) 
    stat_function(fun = calc.components, 
       arg = list(mix = mix.info, comp.number = i), 
       geom = "line", # use alpha=.5 for "polygon" 
       size = 1, 
       color = "red")) # DISTRIB_COLORS[i] 
print(g + distComps)

UPDATE：只要我努力的一個快速的注意。我另外嘗試了幾個其他選項，包括將圖的x軸縮放比例轉換爲正常，並在直方圖部分請求原始數據值'日誌轉換，如下所示：geom_histogram(aes(x = log10(data), fill = ..count..), binwidth = 0.01)，但最終結果仍保持不變。關於我的第一條評論，我意識到，只要我使用對..count ..對象的引用，就不需要我提到的轉換。

UPDATE 2：將由stat_function()生成的行的顏色更改爲紅色，以澄清問題。

來源

2014-09-01 Aleksandr Blekh

剛剛意識到，對於這句和前面的問題，我可能需要** **乘元件分銷*數據值*至*總爲了從*密度分佈*移動到*計數分佈*，每個組件分佈中元素的數量*（在我們的例子中它們是相等的）。如果它是有道理的，那麼我應該怎麼做，使用'stat_function（）'？我想，通過將一個乘數作爲相應的參數添加到'calc.components'函數和'stat_function'的'arg'列表的相應參數。 – 2014-09-01 05:41:40

我低估了這個問題，因爲它太冗長，而且大膽的面孔降低了可讀性。請提出更多問題。另外，您承認我們想要最小程度的重現性示例。請嘗試創建一個。 – Roland 2014-09-01 07:31:21

@Roland：只要證明有效，我就沒有問題了，就像你剛纔那樣。對大膽的字體抱歉 - 我試圖強調重要的元素/點。將限制其在未來的使用，並將嘗試提供更緊湊的問題。關於可重複的例子，我剛剛創建了一個，並很快更新我的問題。感謝您的幫助！ – 2014-09-01 07:38:25

最後，我已經找到了問題，刪除了我之前的答案，並且我在下面提供了我的最新解決方案（我沒有解決的唯一問題是組件的圖例面板 - 它並未出於某種原因，但對於EDA來證明混合分佈的存在我認爲它已經足夠好了）。下面是完整的可重複的解決方案。感謝那些直接或間接幫助過我們的人。

library(ggplot2) 
library(scales) 
library(RColorBrewer) 
library(mixtools) 

NUM_COMPONENTS <- 2 

set.seed(12345) # for reproducibility 

data(diamonds, package='ggplot2') # use built-in data 
myData <- diamonds$price 


calc.components <- function(x, mix, comp.number) { 

    mix$lambda[comp.number] * 
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number]) 
} 


overlayHistDensity <- function(data, calc.comp.fun) { 

    # extract 'k' components from mixed distribution 'data' 
    mix.info <- normalmixEM(data, k = NUM_COMPONENTS, 
          maxit = 100, epsilon = 0.01) 
    summary(mix.info) 

    numComponents <- length(mix.info$sigma) 
    message("Extracted number of component distributions: ", 
      numComponents) 

    DISTRIB_COLORS <- 
    suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1")) 

    # create (plot) histogram and ... 
    g <- ggplot(as.data.frame(data), aes(x = data)) + 
    geom_histogram(aes(y = ..density..), 
        binwidth = 0.01, alpha = 0.5) + 
    theme(legend.position = 'top', legend.direction = 'horizontal') 

    comp.labels <- lapply(seq(numComponents), 
         function (i) paste("Component", i)) 

    # ... fitted densities of components 
    distComps <- lapply(seq(numComponents), function (i) 
    stat_function(fun = calc.comp.fun, 
        args = list(mix = mix.info, comp.number = i), 
        size = 2, color = DISTRIB_COLORS[i])) 

    legend <- list(scale_colour_manual(name = "Legend:", 
            values = DISTRIB_COLORS, 
            labels = unlist(comp.labels))) 

    return (g + distComps + legend) 
} 

overlayPlot <- overlayHistDensity(log10(myData), 'calc.components') 
print(overlayPlot)

結果：

enter image description here

來源

2014-09-03 09:43:51

ggplot2數據和刻度的日誌轉換

回答

相關問題