我試着想象的分佈函數兩個柱狀圖之間的差異等方面的差異在以下兩條曲線:在彼此的頂部如何可視化概率分佈函數之間的差異?
當差異大,你可以只繪製兩條曲線和如上所述填補差異,但是當差異變得非常小時,這是麻煩的。繪製此的另一種方式,正在密謀的差異本身如下:
不過,這似乎很難讀給大家看這樣的圖是第一次,所以我想知道:有沒有用其他方式可以看到兩個分佈函數之間的區別?
我試着想象的分佈函數兩個柱狀圖之間的差異等方面的差異在以下兩條曲線:在彼此的頂部如何可視化概率分佈函數之間的差異?
當差異大,你可以只繪製兩條曲線和如上所述填補差異,但是當差異變得非常小時,這是麻煩的。繪製此的另一種方式,正在密謀的差異本身如下:
不過,這似乎很難讀給大家看這樣的圖是第一次,所以我想知道:有沒有用其他方式可以看到兩個分佈函數之間的區別?
我想也許這可能是一個選擇,只是簡單地結合你的兩個命題,同時擴大差異,使其可見。
接下來是試圖用ggplot2來做到這一點。其實這比我最初想象的要多一點,我對結果絕對不滿意;但也許它有幫助。評論和改進非常受歡迎。
library(ggplot2)
library(dplyr)
## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
mutate(sample = as.factor(sample))
## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)
## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2),
side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
geom_line(aes(x = x, y = 5 * abs(ydiff), colour = side)) +
geom_area(aes(x = x, y = 5 * abs(ydiff), fill = side, alpha = 0.4))
g3 <- g2 +
geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
xlim(0, 10) +
guides(alpha = FALSE, colour = FALSE) +
ylab("Curves: density\n Shaded area: 5 * difference of densities") +
scale_fill_manual(name = "samples", labels = 1:2, values = gg_color_hue(2)) +
scale_colour_manual(limits = list(1, 2, FALSE, TRUE), values = rep(gg_color_hue(2), 2))
print(g3)
正如意見建議的@Gregor,這裏有一個版本,那麼下面的海誓山盟兩個獨立的地塊,但共享相同x軸縮放。至少傳說應該明顯地被調整。
library(ggplot2)
library(dplyr)
library(grid)
## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
mutate(sample = as.factor(sample))
## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)
## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2),
side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
xlim(0, 10) +
guides(alpha = FALSE, fill = FALSE)
g3 <- ggplot(df2) +
geom_line(aes(x = x, y = abs(ydiff), colour = side)) +
geom_area(aes(x = x, y = abs(ydiff), fill = side, alpha = 0.4)) +
guides(alpha = FALSE, fill = FALSE)
## See [3]
grid.draw(rbind(ggplotGrob(g2), ggplotGrob(g3), size="last"))
...或abs(ydiff)
通過ydiff
在第二情節的建設代替:
來源:SO answer 3
我覺得這是一個有趣的問題,但它對於SO來說太開放和基於觀點。 (而且它也不是真的關於編程。)也許這將是交叉驗證的主題? – Gregor 2015-03-31 21:53:08
只是爲了確保我們談論的是同樣的事情:您想要通過考慮實現所述概率分佈的直方圖來可視化概率密度函數,對嗎?因爲累積分佈函數是非常不同的... – jhin 2015-03-31 22:34:59
示例數據集將會很好。 – jhin 2015-03-31 22:35:09