2017-02-23 50 views
0

我正在使用R dendextend包來繪製由hclust {stats}中的每個hclust方法生成的hclust樹對象:「ward.D」,「ward.D2」,「single」, 「完整」,「平均」(= UPGMA),「mcquitty」(= WPGMA),「中位數」(= WPGMC)或「質心」(= UPGMC)。dendextend:color_branches對某些hclust方法不起作用

我注意到當我使用method =「median」或「centroid」時,color_branches的顏色編碼失敗。

我用隨機生成的矩陣對它進行了測試,併爲「中值」和「質心」方法複製了錯誤,是否有特定的原因呢?

請參閱鏈接的輸出曲線: fig1. hclust methods (a) ward.D2, (b) median, (c) centroid

library(dendextend) 
set.seed(1) 
df <- as.data.frame(replicate(10, rnorm(20))) 
df.names <- rep(c("black", "red", "blue", "green", "cyan"), 2) 
df.col <- rep(c("black", "red", "blue", "green", "cyan"), 2) 
colnames(df) <- df.names 
df.dist <- dist(t(df), method = "euclidean") 

# plotting works for "ward.D", "ward.D2", "single", "complete", "average", "mcquitty" 
dend <- as.dendrogram(hclust(df.dist, method = "ward.D2"), labels = df.names) 
labels_colors(dend) <- df.col[order.dendrogram(dend)] 
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)]) 
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE) 

# color_branches fails for "median" or "centroid" 
dend <- as.dendrogram(hclust(df.dist, method = "median"), labels = df.names) 
labels_colors(dend) <- df.col[order.dendrogram(dend)] 
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)]) 
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE) 

dend <- as.dendrogram(hclust(df.dist, method = "centroid"), labels = df.names) 
labels_colors(dend) <- df.col[order.dendrogram(dend)] 
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)]) 
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE) 

我使用dendextend_1.4.0。 Session Info如下:

sessionInfo() 
R version 3.3.2 (2016-10-31) 
Platform: x86_64-apple-darwin13.4.0 (64-bit) 
Running under: macOS Sierra 10.12.3 

謝謝。

+0

它正常工作對我來說,你有什麼確切的輸出,請粘貼它。 –

+0

好吧,我現在明白你的意思了。問題是這段代碼產生樹高度「怪異」的簇。在這種情況下,我不清楚如何解決它,因爲「切」的含義不明確。 –

+0

Hi Tal,是的,我懷疑它與我生成的數據產生的「怪異」樹高有關,但由於我能夠以隨機矩陣的形式再現它,所以我很好奇它是否與簇方法有關 - 如果這些方法有生成這些類型的樹木的趨勢。標籤的顏色編碼工作...有沒有辦法讓我編輯代碼來標記剪切不清晰並根據標籤順序分配分支的顏色? –

回答

1

可以解決使用branches_attr_by_clusters這個問題(雖然它可能會有點棘手,見下面的例子):

library(dendextend) 
set.seed(1) 
df <- as.data.frame(replicate(10, rnorm(20))) 
df.names <- rep(c("black", "red", "blue", "green", "cyan"), 2) 
df.col <- rep(c("black", "red", "blue", "green", "cyan"), 2) 
colnames(df) <- df.names 
df.dist <- dist(t(df), method = "euclidean") 

# plotting works for "ward.D", "ward.D2", "single", "complete", "average", "mcquitty" 
dend <- as.dendrogram(hclust(df.dist, method = "ward.D2"), labels = df.names) 
labels_colors(dend) <- df.col[order.dendrogram(dend)] 
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)]) 
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE) 

# color_branches fails for "median" or "centroid" 
dend <- as.dendrogram(hclust(df.dist, method = "median"), labels = df.names) 
aa <- df.col[order.dendrogram(dend)] 
labels_colors(dend) <- aa 
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)]) 
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE) 

aa <- factor(aa, levels = unique(aa)) 
dend %>% branches_attr_by_clusters(aa, value = levels(aa)) %>% plot 

enter image description here