2015-12-21 13 views
5

我已經構建了一個蛋白質家族的系統發育樹,可以將其分爲不同的組,根據受體的類型或反應類型對每個蛋白質進行分類。樹中的節點被標記爲受體的類型。如何通過節點或葉子中的標籤摺疊系統發生樹中的分支?

在系統發育樹中,我可以看到屬於相同組或類型受體的蛋白質聚集在同一分支中。因此,我想摺疊這些具有共同標籤的分支,並按給定的關鍵字列表對其進行分組。

的命令應該是這樣的:

./collapse_tree_by_label -f -l phylogenetic_tree.newick -o list_of_labels_to_collapse.txt collapsed_tree.eps(或pdf)

我list_of_labels_to_collapse.txt會是這樣: 甲 乙 ç d

我newick樹會是這樣: (A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9;(((B_1:0.05,B_2 :0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0。 7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2)

輸出圖像而不塌陷是這樣的:

輸出圖像摺疊應該是這樣的(collapsed_tree.eps): http://i.stack.imgur.com/TLXd0.png

三角形的寬度應表示分支長度,而三角形的高度必須表示分支中的節點數。

我一直在玩R中的「猿」包我能夠繪製進化樹,但我仍然無法弄清楚如何通過關鍵字標籤摺疊分支:

require("ape") 

這將加載樹:

cat("((A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n") 
tree.test <- read.tree("ex.tre") 

這裏應該是崩潰

這將繪製樹代碼:

plot(tree.test) 

回答

4

您的樹,因爲它存儲在R中已經有提示存儲爲polytomies。這只是一個繪製三角形表示多態的樹的問題。

有一個在ape無功能做到這一點,我是知道的,但如果你惹的繪圖功能一點點,你可以把它關閉

# Step 1: make edges for descendent nodes invisible in plot: 
groups <- c("A", "B", "C", "D") 
group_edges <- numeric(0) 
for(group in groups){ 
    group_edges <- c(group_edges,getMRCA(tree.test,tree.test$tip.label[grepl(group, tree.test$tip.label)])) 
} 
edge.width <- rep(1, nrow(tree.test$edge)) 
edge.width[tree.test$edge[,1] %in% group_edges ] <- 0 


# Step 2: plot the tree with the hidden edges 
plot(tree.test, show.tip.label = F, edge.width = edge.width) 

# Step 3: add triangles 
add_polytomy_triangle <- function(phy, group){ 
    root <- length(phy$tip.label)+1 
    group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)] 
    group_nodes <- which(phy$tip.label %in% group_node_labels) 
    group_mrca <- getMRCA(phy,group_nodes) 

    tip_coord1 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[1]) 
    tip_coord2 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[length(group_nodes)]) 
    node_coord <- c(dist.nodes(phy)[root, group_mrca], mean(c(tip_coord1[2], tip_coord2[2]))) 

    xcoords <- c(tip_coord1[1], tip_coord2[1], node_coord[1]) 
    ycoords <- c(tip_coord1[2], tip_coord2[2], node_coord[2]) 
    polygon(xcoords, ycoords) 
} 

然後你只需通過有循環添加三角形的組

for(group in groups){ 
    add_polytomy_triangle(tree.test, group) 
} 
+0

非常感謝您!這工作! – RDlady

+0

只是另一個問題。此算法是否適用於將不同分支合併爲子樹(如將兩個三角形合併爲一個)? – RDlady

+0

@RDlady如果你的意思是你想合併,例如,組B和C爲一個組,你可以在指定組時使用正則表達式:'groups < - c(「A」,「B | C」,「D」 )'。或者,您可以重新命名提示標籤,以便它們對應於您想要的組 –

1

我認爲腳本終於做了我想要的。 從@CactusWoman提供的答案中,我稍微更改了代碼,因此腳本將嘗試找到表示與我的搜索模式匹配的最大分支的MRCA。這解決了不合並非多重分支的問題,或者因爲一個匹配節點錯誤地位於正確分支之外而摺疊整個樹。此外,我還包含一個參數,該參數表示給定分支中的模式丰度比的限制,因此我們可以選擇並摺疊/分組至少90%的提示與搜索模式匹配,以便例。

library(geiger) 
library(phylobase) 
library(ape) 

#functions 
find_best_mrca <- function(phy, group, threshold){ 

    group_matches <- phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)] 
    group_mrca <- getMRCA(phy,phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)]) 
    group_leaves <- tips(phy, group_mrca) 
    match_ratio <- length(group_matches)/length(group_leaves) 

     if(match_ratio < threshold){ 

      #start searching for children nodes that have more than 95% of descendants matching to the search pattern 
      mrca_children <- descendants(as(phy,"phylo4"), group_mrca, type="all") 
      i <- 1 
      new_ratios <- NULL 
      nleaves <- NULL 
      names(mrca_children) <- NULL 

      for(new_mrca in mrca_children){ 
       child_leaves <- tips(tree.test, new_mrca) 
       child_matches <- grep(group, child_leaves, ignore.case=TRUE) 
       new_ratios[i] <- length(child_matches)/length(child_leaves) 
       nleaves[i] <- length(tips(phy, new_mrca)) 
       i <- i+1 
      } 



      match_result <- data.frame(mrca_children, new_ratios, nleaves) 


      match_result_sorted <- match_result[order(-match_result$nleaves,match_result$new_ratios),] 
      found <- numeric(0); 

      print(match_result_sorted) 

      for(line in 1:nrow(match_result_sorted)){ 
       if(match_result_sorted$ new_ratios[line]>=threshold){ 
        return(match_result_sorted$mrca_children[line]) 
        found <- 1 
       } 

      } 

      if(found==0){return(found)} 
     }else{return(group_mrca)} 




} 

add_triangle <- function(phy, group,phylo_plot){ 

    group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)] 
    group_mrca <- getMRCA(phy,group_node_labels) 
    group_nodes <- descendants(as(tree.test,"phylo4"), group_mrca, type="tips") 
    names(group_nodes) <- NULL 

    x<-phylo_plot$xx 
    y<-phylo_plot$yy 


    x1 <- max(x[group_nodes]) 
    x2 <-max(x[group_nodes]) 
    x3 <- x[group_mrca] 

    y1 <- min(y[group_nodes]) 
    y2 <- max(y[group_nodes]) 
    y3 <- y[group_mrca] 

    xcoords <- c(x1,x2,x3) 
    ycoords <- c(y1,y2,y3) 

    polygon(xcoords, ycoords) 

    return(c(x2,y3)) 

} 



#main 

    cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n") 
tree.test <- read.tree("ex.tre") 


# Step 1: Find the best MRCA that matches to the keywords or search patten 

groups <- c("A", "B|C", "D") 
group_labels <- groups 

group_edges <- numeric(0) 
edge.width <- rep(1, nrow(tree.test$edge)) 
count <- 1 


for(group in groups){ 

    best_mrca <- find_best_mrca(tree.test, group, 0.90) 

    group_leaves <- tips(tree.test, best_mrca) 

    groups[count] <- paste(group_leaves, collapse="|") 
    group_edges <- c(group_edges,best_mrca) 

    #Step2: Remove the edges of the branches that will be collapsed, so they become invisible 
    edge.width[tree.test$edge[,1] %in% c(group_edges[count],descendants(as(tree.test,"phylo4"), group_edges[count], type="all")) ] <- 0 
    count = count +1 

} 


#Step 3: plot the tree hiding the branches that will be collapsed/grouped 

last_plot.phylo <- plot(tree.test, show.tip.label = F, edge.width = edge.width) 

#And save a copy of the plot so we can extract the xy coordinates of the nodes 
#To get the x & y coordinates of a plotted tree created using plot.phylo 
#or plotTree, we can steal from inside tiplabels: 
last_phylo_plot<-get("last_plot.phylo",envir=.PlotPhyloEnv) 

#Step 4: Add triangles and labels to the collapsed nodes 
for(i in 1:length(groups)){ 

    text_coords <- add_triangle(tree.test, groups[i],last_phylo_plot) 

    text(text_coords[1],text_coords[2],labels=group_labels[i], pos=4) 

} 
1

我也一直在尋找這種工具的時代,沒有那麼多倒塌分類羣,但對於崩潰基於數字支持值內部節點。

ape包中的di2multi函數可以將節點摺疊爲多節點,但它目前只能通過分支長度閾值來完成此操作。 這是一個粗略的改編,它允許通過節點支持值閾值進行摺疊(默認閾值= 0.5)。

使用需要您自擔風險,但它適用於我的植根於貝葉斯樹。

di2multi4node <- function (phy, tol = 0.5) 
    # Adapted di2multi function from the ape package to plot polytomies 
    # based on numeric node support values 
    # (di2multi does this based on edge lengths) 
    # Needs adjustment for unrooted trees as currently skips the first edge 
{ 
    if (is.null(phy$edge.length)) 
    stop("the tree has no branch length") 
    if (is.na(as.numeric(phy$node.label[2]))) 
    stop("node labels can't be converted to numeric values") 
    if (is.null(phy$node.label)) 
    stop("the tree has no node labels") 
    ind <- which(phy$edge[, 2] > length(phy$tip.label))[as.numeric(phy$node.label[2:length(phy$node.label)]) < tol] 
    n <- length(ind) 
    if (!n) 
    return(phy) 
    foo <- function(ancestor, des2del) { 
    wh <- which(phy$edge[, 1] == des2del) 
    for (k in wh) { 
     if (phy$edge[k, 2] %in% node2del) 
     foo(ancestor, phy$edge[k, 2]) 
     else phy$edge[k, 1] <<- ancestor 
    } 
    } 
    node2del <- phy$edge[ind, 2] 
    anc <- phy$edge[ind, 1] 
    for (i in 1:n) { 
    if (anc[i] %in% node2del) 
     next 
    foo(anc[i], node2del[i]) 
    } 
    phy$edge <- phy$edge[-ind, ] 
    phy$edge.length <- phy$edge.length[-ind] 
    phy$Nnode <- phy$Nnode - n 
    sel <- phy$edge > min(node2del) 
    for (i in which(sel)) phy$edge[i] <- phy$edge[i] - sum(node2del < 
                  phy$edge[i]) 
    if (!is.null(phy$node.label)) 
    phy$node.label <- phy$node.label[-(node2del - length(phy$tip.label))] 
    phy 
} 
1

這是基於phytools::phylo.toBackbone功能我的回答, 看到http://blog.phytools.org/2013/09/even-more-on-plotting-subtrees-as.htmlhttp://blog.phytools.org/2013/10/finding-edge-lengths-of-all-terminal.html。首先,在代碼結尾加載函數。

library(ape) 
library(phytools) #phylo.toBackbone 
library(phangorn) 

cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n") 

phy <- read.tree("ex.tre") 
groups <- c("A", "B|C", "D") 

backboneoftree<-makebackbone(groups,phy) 
# tip.label clade.label N  depth 
# 1  A_1   A 10 0.2481818 
# 2  B_1   B|C 6 0.9400000 
# 3  D_1   D 5 0.4600000 

par(mfrow=c(2,2), mar=c(0,2,2,0)) 
plot(phy, main="Complete tree") 
plot(backboneoftree) 

makebackbone<-function(groupings,phy){ 
    listofspecies<-phy$tip.label 
    listtopreserve<-list() 
    lengthofclades<-list() 
    meandistnode<-list() 
    newedgelengths<-list() 
    for (i in 1:length(groupings)){ 
    groupings<-groups 
    bestmrca<-getMRCA(phy,grep(groupings[i], phy$tip.label)) 
    mrcatips<-phy$tip.label[unlist(phangorn::Descendants(phy,bestmrca, type="tips"))] 
    listtopreserve[i]<- mrcatips[1] 
    meandistnode[i]<- mean(dist.nodes(phy)[unlist(lapply(mrcatips, 
    function(x) grep(x, phy$tip.label))),bestmrca]) 
    lengthofclades[i]<-length(mrcatips) 
    provtree<-drop.tip(phy,mrcatips, trim.internal=F, subtree = T) 
    n3<-length(provtree$tip.label) 
    newedgelengths[i]<-setNames(provtree$edge.length[sapply(1:n3,function(x,y) 
     which(y==x),y=provtree$edge[,2])], 
     provtree$tip.label)[provtree$tip.label[grep("tips",provtree$tip.label)] ] 
    } 
    newtree<-drop.tip(phy,setdiff(listofspecies,unlist(listtopreserve)), 
        trim.internal = T) 
    n<-length(newtree$tip.label) 
    newtree$edge.length[sapply(1:n,function(x,y) 
    which(y==x),y=newtree$edge[,2])]<-unlist(newedgelengths)+unlist(meandistnode) 
    trans<-data.frame(tip.label=newtree$tip.label,clade.label=groupings, 
        N=unlist(lengthofclades), depth=unlist(meandistnode)) 
    rownames(trans)<-NULL 
    print(trans) 
    backboneoftree<-phytools::phylo.toBackbone(newtree,trans) 
    return(backboneoftree) 
} 

enter image description here

編輯:我沒有嘗試這樣做,但也可能是另一個答案:「腳本和功能改造一棵樹的頂端枝條,即厚度或三角形,與與某些參數都相關寬度(例如,進化枝的種數)(tip.branches.R)」 http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/use_r.html http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/tip.branches.R

相關問題