2017-09-12 60 views
0

我需要一個刻面的boxplot。圖的x軸是一個定量變量,我想在該圖上反映這些信息。橫座標的尺度在各個方面之間是非常不同的。ggplot2:與不同比例尺的刻面拼接時出現錯誤的boxplot寬度

我的問題是,對於大規模的方面,盒子的寬度非常小。

一個可能的解釋是盒子的寬度對於所有facet是相同的,而理想情況下應該由每個facet的xlims單獨確定。

我將不勝感激,兩個輸入:

  • 你認爲這是一個錯誤,應該報告?
  • 你有解決方案嗎?

在此先感謝!備註:將橫座標轉換爲分類變量可能是一種解決方案,但它並不完美,因爲它會導致某些信息的丟失。

最小工作例如:

library(tidyverse) 

c(1:4,7) %>% 
    c(.,10*.) %>% # Create abscissa on two different scales 
    lapply(FUN = function(x) {tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))}) %>% # Create sample (y) and label (idx) 
    bind_rows() %>% 
    ggplot(aes(x = x, y = y, group = x)) + 
    geom_boxplot() + 
    facet_wrap(~idx, scales = 'free') 

結果:

Result

繁瑣的解決辦法是重新繪製從頭箱線圖,但這不是很滿意:

draw_boxplot = function(locations, width, ymin, lower, middle, upper, ymax, idx){ 

    local_df = tibble(locations = locations, width = width, ymin = ymin, lower = lower, middle = middle, upper = upper, ymax = ymax, idx = idx) 

    ggplot(data = local_df) + 
    geom_rect(aes(xmin = locations - width/2, xmax = locations + width/2, ymin = lower, ymax = upper), fill = 'white', colour = 'black') + 
    geom_segment(aes(x = locations - width/2, xend = locations + width/2, y = middle, yend = middle), size = 0.8) + 
    geom_segment(aes(x = locations, xend = locations, y = upper, yend = ymax)) + 
    geom_segment(aes(x = locations, xend = locations, y = lower, yend = ymin)) + 
    facet_wrap(~idx, scales = 'free_x') 
} 

make_boxplot = function(to_plot){ 
    to_plot %>% 
    cmp_boxplot %>% 
    (function(x){ 
     draw_boxplot(locations = x$x, width = x$width, ymin = x$y0, lower = x$y25, middle = x$y50, upper = x$y75, ymax = x$y100, idx = x$idx) 
    }) 

} 


cmp_boxplot = function(to_plot){ 
    to_plot %>% 
    group_by(idx) %>% 
    mutate(width = 0.6*(max(x) - min(x))/length(unique(x))) %>% #hand specified width 
    group_by(x) %>% 
    mutate(y0 = min(y), 
      y25 = quantile(y, 0.25), 
      y50 = median(y), 
      y75 = quantile(y, 0.75), 
      y100 = max(y)) %>% 
    select(-y) %>% 
    unique() 
} 

c(1:4,7) %>% 
    c(.,10*.) %>% 
    lapply(FUN = function(x) {tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B'))}) %>% 
    bind_rows() %>% 
    make_boxplot 

結果:

Result

+1

一般箱形圖是/應該用於分類變量而不是數值... –

+1

嘗試:'ggplot(aes(x = as.factor(x),y​​ = y))' – missuse

+0

感謝missuse,但我認爲您的建議等於:「將橫座標轉換爲分類變量可能是一個解決方案,但它並不完美,因爲它會導致一些信息的丟失。「 – konkam

回答

0

由於geom_boxplot不允許改變width作爲審美,你必須寫你自己的。幸運的是它不是太複雜。

bp_custom <- function(vals, type) { 

    bp = boxplot.stats(vals) 

    if(type == "whiskers") { 
    y = bp$stats[1] 
    yend = bp$stats[5] 
    return(data.frame(y = y, yend = yend)) 
    } 

    if(type == "box") { 
    ymin = bp$stats[2] 
    ymax = bp$stats[4] 
    return(data.frame(ymin = ymin, ymax = ymax)) 
    } 

    if(type == "median") { 
    y = median(vals) 
    yend = median(vals) 
    return(data.frame(y = y, yend = yend)) 
    } 

    if(type == "outliers") { 
    y = bp$out 
    return(data.frame(y = y)) 
    } else { 
    return(warning("Type must be one of 'whiskers', 'box', 'median', or 'outliers'.")) 
    } 
} 

這個函數做所有的計算並返回適用於stat_summary使用dataframes。然後我們在幾個不同的層次中調用它來構造boxplot的各個位。請注意,您需要計算每個方面的箱線寬度,在管道中使用dplyr進行。我計算了寬度,以便x的範圍根據唯一的x值的數量分成相等的片段,然後每個框獲得該片段寬度的大約1/2。您的數據可能需要不同的調整。

library(dplyr) 

c(1:4,7) %>% 
    c(.,10*.) %>% # Create abscissa on two different scales 
    lapply(FUN = function(x) { 
    tibble(x = x, y = rnorm(50), idx = ifelse(test = x<8, yes = 'A', no = 'B')) 
    }) %>% 
    bind_rows() %>%         
    group_by(idx) %>%            # NOTE THIS LINE 
    mutate(width = 0.25*diff(range(x))/length(unique(x))) %>%  # NOTE THIS LINE 
    ggplot(aes(x = x, y = y, group = x)) + 
    stat_summary(fun.data = bp_custom, fun.args = "whiskers", 
       geom = "segment", aes(xend = x)) + 
    stat_summary(fun.data = bp_custom, fun.args = "box", 
       geom = "rect", aes(xmin = x - width, xmax = x + width), 
       fill = "white", color = "black") + 
    stat_summary(fun.data = bp_custom, fun.args = "median", 
       geom = "segment", aes(x = x - width, xend = x + width), size = 1.5) + 
    stat_summary(fun.data = bp_custom, fun.args = "outliers", 
       geom = "point") + 
    facet_wrap(~idx, scales = 'free') 

enter image description here

至於報告這個bug(實際上是一個希望的功能),我認爲這是一個罕見的不夠用的情況下,他們不會優先考慮它。如果您將此代碼包裝到自定義geom(基於here)並提交拉請求,您可能會獲得更多運氣。

+0

非常感謝Brian,這是一個非常好的和有用的答案 – konkam