2016-07-22 54 views
1

我有n意見,我已經計算了m簇。我生成的集羣實際上是分層分裂的,即使它們是獨立計算的。這裏是我的數據的一個子集:創建一個分層簇對象

print(test) 

    m_0 m_13000 m_14608 m_16278 
    <dbl> <dbl> <dbl> <dbl> 
1  1  10 101 1001 
2  1  10 101 1002 
3  1  11 102 1003 
4  1  11 102 1004 
5  1  12 103 1005 
6  1  12 104 1006 
7  2  13 105 1007 
8  2  13 106 1008 
9  2  13 106 1009 
10  2  14 107 1010 
.. ...  ...  ...  ... 

每一行i = 1:n是一個觀察,每列j = 1:m是基於聚類j意見的成員。羣集ID在不同的羣集解決方案中是唯一的,即min(test[, j]) > max(test[, j-1])

觀察值表示爲igraph圖上的頂點。 我想將上面的test數據轉換爲合併矩陣,以傳遞給igraph::make_clusters以進一步處理。做這個的最好方式是什麼?我查看了由this example創建的合併矩陣,但我並不真正瞭解它。誰能幫我嗎?

回答

0

我的解決辦法結束了被使用的the answer to a related SO question about dendrograms一個修改的版本的數據幀到Newick樹字符串轉換,然後讀取所產生的字符串轉換成使用phytools::read.newick一個phylo對象,在這一點,我可以使用ape::as.hclust轉換爲hclust對象(如果需要的話)。不錯!

(略編輯)與其他解決方案,以便回答

注:這些功能似乎並沒有發揮好與tibbles,所以使用標準data.frames代替

df2newick <- function(df, innerlabel = FALSE){ 
    traverse <- function(a, i, innerl){ 
    if(i < (ncol(df))){ 
     alevelinner <- as.character(
      unique(df[which(as.character(df[,i]) == a), i + 1]) 
     ) 
     desc <- NULL 
     for(b in alevelinner) 
      desc <- c(desc, traverse(b, i + 1, innerl)) 
     il <- NULL 
     if(innerl==TRUE) 
      il <- paste0(",", a) 
     (newickout <- paste("(", paste(desc,collapse = ","), ")", il, 
      sep="")) 
    } 
    else { 
     (newickout <- a) 
    } 
    } 

    alevel <- as.character(unique(df[,1])) 
    newick <- NULL 
    for(x in alevel) 
    newick <- c(newick, traverse(x, 1, innerlabel)) 
    (newick <- paste("(", paste(newick, collapse = ","), ");", sep="")) 
} 

重現的實例

ex = structure(list(level.1 = c("1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1"), level.2 = c("883", "883", "883", 
"883", "883", "883", "883", "883", "1758", "883", "883", "883", 
"883"), level.3 = c("2293", "2293", "2293", "2293", "2293", "2293", 
"2293", "2293", "3240", "2293", "2293", "2293", "2293"), level.4 = c("3932", 
"3932", "3932", "3932", "3932", "3932", "3932", "3932", "5139", 
"5777", "3932", "3932", "3932"), level.5 = c("6056", "6056", 
"6056", "6056", "6056", "6056", "6056", "6056", "7472", "8110", 
"6056", "6056", "6056"), level.6 = c("8456", "8545", "8949", 
"8456", "8545", "8456", "8545", "8545", "10385", "11023", "8545", 
"8545", "8545"), level.7 = c("11525", "11635", "12084", "12297", 
"12339", "12297", "12339", "12339", "13632", "14270", "12339", 
"12339", "12339"), name = c("A", "B", "C", "D", "E", "F", "G", 
"H", "I", "J", "K", "L", "M")), class = "data.frame", .Names = c("level.1", 
"level.2", "level.3", "level.4", "level.5", "level.6", "level.7", 
"name"), row.names = c(NA, -13L)) 

treestring = df2newick(ex, innerlabel = FALSE) 

library(phytools) 
extree = collapse.singles(read.newick(text = treestring)) 
extree$node.label = head(names(ex), -1) 
plot(extree, show.node.label = TRUE) 
1

的替代(並且很容易)的解決方案是使用data.tree包。

library(data.tree)  
tree = as.Node(ex) 
library(ape) 
ph = as.phylo(tree) 
as.hclust(ph) 

但是,請注意,您需要一些方法,以轉化成hclust對象定義分支長度。這個相同的約束適用於我的其他答案。