2015-06-29 34 views
3

我有一個數字矩陣R與24行和10,000列。這個矩陣的行名基本上是文件名,我已經讀取了與這24行相對應的數據。除此之外,我還有一個單獨的24個因子列表,指定24個文件所屬的組。有3組 - 醇,碳氫化合物和酯。的名稱和相應的組其所屬看起來像這樣:如何根據定義的組對樹狀圖的標籤着色? (在R)

> MS.mz 
[1] "int-354.19" "int-361.35" "int-368.35" "int-396.38" "int-408.41" "int-410.43" "int-422.43" 
[8] "int-424.42" "int-436.44" "int-438.46" "int-452.00" "int-480.48" "int-648.64" "int-312.14" 
[15] "int-676.68" "int-690.62" "int-704.75" "int-312.29" "int-326.09" "int-326.18" "int-326.31" 
[22] "int-340.21" "int-340.32" "int-352.35" 

> MS.groups 
[1] Alcohol  Alcohol  Alcohol  Alcohol  Hydrocarbon Alcohol  Hydrocarbon Alcohol  
[9] Hydrocarbon Alcohol  Alcohol  Alcohol  Ester  Alcohol  Ester  Ester  
[17] Ester  Alcohol  Alcohol  Alcohol  Alcohol  Alcohol  Alcohol  Hydrocarbon 
Levels: Alcohol Ester Hydrocarbon 

我想生成樹狀圖來看看如何在矩陣中的數據可以被集羣。所以,我用了下面的命令:

require(vegan) 
dist.mat<-vegdist(MS.data.scaled.transposed,method="euclidean") 
clust.res<-hclust(dist.mat) 
plot(clust.res) 

我得到了一個樹形圖。現在,我想根據它們屬於的組對樹狀圖中的文件名進行着色,即Alcohol,Hydrocarbon或Ester。我看了看貼在論壇上不同的例子一樣

Label and color leaf dendrogram in r

Label and color leaf dendrogram in R using ape package

Clustering with bootstrapping

,但無法實現它爲我的數據。我不知道如何關聯row.names與MS.groups以獲取樹狀圖中的彩色名稱。

在生成使用dendextend(如https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf解釋)樹,我得到了下面的樹

enter image description here

這裏是用來生成它的代碼:

require(colorspace) 
d_SIMS <- dist(firstpointsample5[,-1]) 
hc_SIMS <- hclust(d_SIMS) 
labels(hc_SIMS) 
dend_SIMS <- as.dendrogram(hc_SIMS) 
SIMS_groups <- rev(levels(firstpointsample5[, 1])) 
dend_SIMS <- color_branches(dend_SIMS, k = 3, groupLabels = SIMS_groups) 
is.character(labels(dend_SIMS)) 
plot(dend_SIMS) 
labels_colors(dend_SIMS) <- rainbow_hcl(3)[sort_levels_values(as.numeric(firstpointsample5[,1])[order.dendrogram(dend_SIMS)])] 
labels(dend_SIMS) <- paste(as.character(firstpointsample5[, 1])[order.dendrogram(dend_SIMS)],"(", labels(dend_SIMS), ")", sep = "") 
dend_SIMS <- hang.dendrogram(dend_SIMS, hang_height = 0.1) 
dend_SIMS <- assign_values_to_leaves_nodePar(dend_SIMS, 0.5,"lab.cex") 
par(mar = c(3, 3, 3, 7)) 
plot(dend_SIMS, main = "Clustered SIMS dataset\n (the labels give the true m/z groups)",horiz = TRUE, nodePar = list(cex = 0.007)) 
legend("topleft", legend = SIMS_groups, fill = rainbow_hcl(3)) 

回答

6

我懷疑你正在尋找的功能或者是color_labelsget_leaves_branches_col。第一種顏色是基於cutree(如color_branches)的標籤,第二種顏色允許您獲取每片葉子分支的顏色,然後使用它爲樹的標籤着色(如果使用不尋常的方法爲分支着色(當使用branches_attr_by_labels時發生)。例如:

# define dendrogram object to play with: 
hc <- hclust(dist(USArrests[1:5,]), "ave") 
dend <- as.dendrogram(hc) 

library(dendextend) 
par(mfrow = c(1,2), mar = c(5,2,1,0)) 
dend <- dend %>% 
     color_branches(k = 3) %>% 
     set("branches_lwd", c(2,1,2)) %>% 
     set("branches_lty", c(1,2,1)) 

plot(dend) 

dend <- color_labels(dend, k = 3) 
# The same as: 
# labels_colors(dend) <- get_leaves_branches_col(dend) 
plot(dend) 

enter image description here

無論哪種方式,你應該始終有一個看看set功能,什麼可以做,以你的系統樹的想法(這樣可以節省記住所有的不同功能的麻煩名)。

1

您可能需要看看這個教程,它顯示了幾個解決方案,用於可視化R組中的樹狀圖

https://rstudio-pubs-static.s3.amazonaws.com/1876_df0bf890dd54461f98719b461d987c3d.html

但是,我認爲適合您的數據的最佳解決方案是由包「dendextend」提供的。參見教程(關於「光圈」數據集,它類似於您的問題爲例):https://nycdatascience.com/wp-content/uploads/2013/09/dendextend-tutorial.pdf

又見小插曲:http://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html

+0

是的,我已經看過這些鏈接,但生成的樹沒有任何意義。我已經將生成的樹和代碼添加到問題中。在新的樹狀圖中,醇也被標記爲與烴相同的顏色,而烴被標記爲醇的顏色。代碼中有錯誤嗎? – novicegeek

+0

對不起,但沒有可用的'MS.data.scaled.transposed'或'firstpointsample5',所以我無法重現您的示例 – user3875022

0

你可以試試這個解決方案,唯一不變的「實驗室」與' MS.groups'和'var'與'MS.groups'轉換爲數字(也許,與as.numeric)。 它來自How to colour the labels of a dendrogram by an additional factor variable in R

## The data 
df <- structure(list(labs = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", 
"a8", "b1", "b2", "b3", "b4", "b5", "b6", "b7"), var = c(1L, 1L, 2L,  
1L,2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), td = c(13.1, 14.5, 16.7, 
12.9, 14.9, 15.6, 13.4, 15.3, 12.8, 14.5, 14.7, 13.1, 14.9, 15.6, 14.6), 
fd = c(2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 2L, 4L, 2L, 1L, 4L, 3L, 3L)), 
.Names = c("labs", "var", "td", "fd"), class = "data.frame", row.names = 
c(NA, -15L)) 

## Subset for clustering 
df.nw = df[,3:4] 

# Assign the labs column to a vector 
labs = df$labs 

d = dist(as.matrix(df.nw))       # find distance matrix 
hc = hclust(d, method="complete")     # apply hierarchical clustering 

## plot the dendrogram 

plot(hc, hang=-0.01, cex=0.6, labels=labs, xlab="") 

## convert hclust to dendrogram 
hcd = as.dendrogram(hc)        

## plot using dendrogram object 
plot(hcd, cex=0.6)         

Var = df$var          # factor variable for colours 
varCol = gsub("1","red",Var)      # convert numbers to colours 
varCol = gsub("2","blue",varCol) 

# colour-code dendrogram branches by a factor 

# ... your code 
colLab <- function(n) { 
    if(is.leaf(n)) { 
    a <- attributes(n) 
    attr(n, "label") <- labs[a$label] 
    attr(n, "nodePar") <- c(a$nodePar, lab.col = varCol[a$label]) 
    } 
    n 
} 

## Coloured plot 
plot(dendrapply(hcd, colLab)) 
相關問題