如何繪製數據（VAR1 VS VAR2作爲線圖）含有大量的因素

我有以下形式的一些數據：如何繪製數據（VAR1 VS VAR2作爲線圖）含有大量的因素

Factor Var1 Var2 
1  100 1.5 
2  150 1.2 
3  90 1.9 
...... 
1  80 2.0 
2  96 2.1 
3  50 2.9

我不得不在VAR2的變化與VAR1比較針對不同的因素。我們的想法是找出Var1中對應於Var2值較低的值的範圍以及哪個因子。我有多達32個因素。

什麼是最好的方式去做這件事？

到目前爲止，我已經在ggplot中實現了這一點（參見下圖）。

ggplot(data = df, aes(x = df$var1, y = df$var2, colour = df$Factor)) + 
    geom_line(size=0.05) + 
    geom_point(size=0.8) + 
    coord_cartesian(ylim = c(0,5)) + 
    labs(x='var1', y='var2')

ggplot Figure

然而，這個數字是非常混亂的，也是很難辨認出圖形的不同的因素，特別是因爲顏色編碼作爲梯度的。

我也嘗試使用不同形狀的點。

ggplot(data = df, aes(x = df$var1, y = df$var2, colour = df$factor)) + 
    geom_line(size=0.05) + 
    geom_point(size=0.8, aes(shape=factor(df$Factor))) + 
    coord_cartesian(ylim = c(0,5)) + 
    labs(x='var1', y='var2')

但是，這給出了警告消息（見下文），並沒有顯示所有因素的符號。

# Warning messages: 
# 1: The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to discriminate; you have 29. Consider specifying shapes manually if you must have them.

什麼是可視化數據中的這種變化的最佳方式？因素的數量可能會有所不同（最多32）。

**如上所述，我嘗試了facet_wrap（見下圖）。

ggplot(data = df, aes(x = df$var1, y = df$var2)) + geom_line(size=0.05) + geom_point(size=0.8) + coord_cartesian(ylim = c(0,5)) + facet_wrap(~ df$Factor) + labs(x='var1', y='var2') #+ geom_hline(yintercept = 2)

facet_wrap fig

對於因素比較起見，我想水平線添加到所有的地塊。但geom_hline（y截距= 2）不會做的伎倆，提供了以下錯誤信息：

Error in `$<-.data.frame`(`*tmp*`, "PANEL", value = c(6L, 8L, 24L, 26L, : replacement has 1170 rows, data has 1

如何水平線添加到所有這些地塊？或者，有沒有辦法將數據框分成更小的數據框，由5-6個因子組成，而不是全部，並繪製每個這些較小的集合？

來源

2017-06-06 Sree

'facet_wrap（〜Factor）'？ –

如果不是，請考慮'plotly :: ggplotly'，您可以在其中與最終圖形進行交互，以便通過雙擊圖例來縮放到特定的截面或截面特定的因子。 –

如果你的因子水平有一個內在秩序，也許這是更好地繪製出來的梯度，就好像它們是數字：

df$Factor_numeric <- as.numeric(gsub("2MCT ", "", Factor)) 
ggplot(data = df, 
     aes(x = var1, 
      y = var2, 
      colour = Factor_numeric) + 
    geom_line(size=0.05) + 
    scale_colour_gradient(name = "Your Factor", 
         labels = function(breaks) paste0("2MCT ", breaks)) + 
    coord_cartesian(ylim = c(0,5)) + 
    labs(x='var1', y='var2')

另一種選擇是在數據幀組使用新列你的因素，所以你最終以較少的情節：

df$grouped_Factor <- NA 
df$grouped_Factor[df$Factor %in% paste0("2MCT ", 101:108)] <- "G1" 
df$grouped_Factor[df$Factor %in% paste0("2MCT ", 109:118)] <- "G2" 
df$grouped_Factor[df$Factor %in% paste0("2MCT ", 119:124)] <- "G3" 
df$grouped_Factor[df$Factor %in% paste0("2MCT ", 125:132)] <- "G4" 

ggplot(data = df, 
     aes(x = var1, y = var2, colour = Factor)) + 
    geom_line(size=0.05) + 
    geom_point(size=0.8) + 
    coord_cartesian(ylim = c(0,5)) + 
    facet_wrap(~ grouped_Factor) + 
    labs(x='var1', y='var2')

來源

2017-06-06 10:42:08 zeehio

我試着分組因素和繪圖。但是，顏色是以漸變形式分配的，所以很難從圖形中清楚地比較var1的給定範圍的值的哪個因子具有較高的Var2值。是否可以指定顏色或使用不同的符號？ – Sree

請參閱'？scale_colour_manual'或'？scale_colour_gradient' – zeehio

如何繪製數據（VAR1 VS VAR2作爲線圖）含有大量的因素

回答

相關問題