2016-11-08 77 views
4

也許這是一個愚蠢的想法,或者它可能是一個腦波。我有4種不同物種的脂類的數據集。數據是成比例的,總和是1000.我想要顯示每個物種中每個類別的比例差異。一般來說,一個堆疊的酒吧將成爲這裏的途徑,但有幾個類,並且它變得無法解釋,因爲只有底層類共享一個基線(見下文)。 Traditional stacked bar小提琴情節形狀的堆積條形圖

這似乎是一個糟糕的羣體的最佳選擇,餡餅和甜甜圈的圖表是冷不防的。 然後,我受到創作Symmetrical, violin plot-like histogram?的啓發,創建了一種堆疊分佈式小提琴劇情(見下文)。 Stacked distribution violin

我想知道這是否可以以某種方式轉換爲堆疊小提琴,使每個段代表整個變量。就我的數據而言,物種'A和D在TAG細分市場將是'肥',而在STEROL細分市場則是'瘦'。這樣的比例水平描繪,並始終有一個共同的基準。思考?

數據:

structure(list(Sample = c("A", "A", "A", "B", "B", "B", "C", 
"C", "C", "D", "D"), WAX = c(83.7179798600773, 317.364310355766, 
20.0147496567679, 93.0194886619568, 78.7886829173726, 79.3445694220837, 
91.0020522660375, 88.1542855137005, 78.3313314713951, 78.4449591023115, 
236.150030864875), TAG = c(67.4640254081232, 313.243238213156, 
451.287867136276, 76.308508343969, 40.127554151831, 91.1910102221636, 
61.658394708941, 104.617259648364, 60.7502685224869, 80.8373642262043, 
485.88633863193), FFA = c(41.0963382465756, 149.264019576272, 
129.672579626868, 51.049208042632, 13.7282635713804, 30.0088572108344, 
47.8878116348504, 47.9564218319094, 30.3836532949481, 34.8474205480686, 
10.9218910757234), `DAG1,2` = c(140.35876401479, 42.4556176551009, 
0, 0, 144.993393432366, 136.722412691012, 0, 140.027443968931, 
137.579074961889, 129.935353616471, 46.6128854387559), STEROL = c(73.0144390122309, 
24.1680929257195, 41.8258704279641, 78.906816661241, 67.5678558060943, 
66.7150537517493, 82.4794113296791, 76.7443442992891, 68.9357008866253, 
64.5444668132533, 29.8342694785768), AMPL = c(251.446564854412, 
57.8713327050339, 306.155806819949, 238.853696442419, 201.783872969561, 
175.935515655693, 234.169038776536, 211.986239116884, 196.931330316831, 
222.658181144794, 73.8944654414811), PE = c(167.99718650752, 
43.3839497916674, 22.1937177530762, 150.315149187176, 153.632530721031, 
141.580725482114, 164.215442147509, 155.113323256627, 143.349000132624, 
128.504657216928, 50.6281347160092), PC = c(174.904702096271, 
52.2494387772846, 28.8494085790995, 191.038328534942, 190.183655117756, 
175.33290326259, 199.2632149392, 175.400682364295, 176.64926273487, 
163.075864395099, 66.071984352649), LPC = c(0, 0, 0, 120.508804125665, 
109.194191312608, 103.16895230176, 119.324634197247, 0, 107.09037767833, 
97.151732936871, 0)), class = c("tbl_df", "tbl", "data.frame" 
), row.names = c(NA, -11L), .Names = c("Sample", "WAX", "TAG", 
"FFA", "DAG1,2", "STEROL", "AMPL", "PE", "PC", "LPC")) 
+0

您對每個樣品多行。你想用這些做什麼?把它們加起來,或者在每個樣本 - 變量組合中顯示這些值的分佈? –

+0

@Jan van der Laan展現出像堆疊酒吧一樣的手段。 –

+0

我已經專注於我的回答中的陰謀,甚至沒有意識到你有多重價值。您必須首先在ggplot2之外進行聚合。 – Roland

回答

2

這基本上是一個水平條形圖:

library(reshape2) 
DFm <- melt(DF, id.vars = "Sample") 
DFm1 <- DFm 
DFm1$value <- -DFm1$value 
DFm <- rbind(DFm, DFm1) 


ggplot(DFm, aes(x = "A", y = value/10, fill = variable, color = variable)) + 
    geom_bar(stat = "identity", position = "dodge") + 
    coord_flip() + 
    theme_minimal() + 
    facet_wrap(~ Sample, nrow = 1, switch = "x") + 
    theme(axis.text = element_blank(), 
     axis.title = element_blank(), 
     panel.grid = element_blank()) 

resulting plot

+0

只是一個想法(也許不是OP想要的),從最大到最小的堆疊會不會更好? – zx8754

+1

@ zx8754我們可以假設這樣的訂單對所有樣品都是一樣的嗎?如果我們不能,我不會那樣做。無論如何,這很容易做到(可以留給讀者作爲練習)。 – Roland

+0

不,我希望他們按不同的順序排列,這樣它就可以顯示每個組中哪個var更「重要」,更像「堆疊玩具」。我知道如果不按順序比較增值會很難,那麼我們可以放一些線。同意,把它作爲讀者的練習。 – zx8754