2013-05-10 53 views
2

這是一個非常具體的問題,但也許有人在這裏知道如何去做。R和GO.db:通過所有GO條款

我想要的是通過本體論的所有GO術語,如「BP」(包GO.db)。我不一定要通過樹遞歸地去,我對GO術語評估順序的唯一要求是對於給定的GO術語,在GO術語之前已經評估過所有它的孩子。

換句話說,我想要構造一個GO術語的字符向量V,例如如果G_x和G_y是兩個GO術語,並且G_x是G_y的父親,那麼這些GO術語的位置的索引i_x和i_y在V中是這樣的,即i_x> i_y。

+1

作爲部分的答案,'G = makeGOGraph( 「BP」) '創建BP本體的graphNEL(來自包圖); '庫(RBGL); dfs(g)'返回深度優先搜索的索引;也許這提供了你需要的信息? – 2013-05-10 12:24:15

回答

1

我認爲這(幾乎)的作品。這個技巧是?unique保留重複元素的第一個實例。

編輯:經過反思,這只是在向量的開始部分用最長的路徑(即最多代)組織術語。我認爲可能會出現這樣一種情況,其中一個術語位於兩個分支上,一個分支的路徑較短,該術語將被放置在較長的路徑上,而較早的路徑則放在較短的路徑上。這就是說,如果你確定一個粗略的估計......

# Root nodes for reference: 
# BP = "GO:0008150" 
# CC = "GO:0005575" 
# MF = "GO:0003674" 

GO_order <- function(node = "GO:0008150", ontology = "BP") { 

    if (ontology == "BP") GOCHILDREN <- GOBPCHILDREN 
    if (ontology == "CC") GOCHILDREN <- GOCCCHILDREN 
    if (ontology == "MF") GOCHILDREN <- GOMFCHILDREN 

    parents <- node 

    # initialize output 
    out <- c(parents) 

    # do the following until there are no more parents 
    while (any(!is.na(parents))) { 
     # Get the unique children of the parents (that aren't NA) 
     children <- unique(unlist(mget(parents[!is.na(parents)], GOCHILDREN))) 

     # append chldren to beginning of `out` 
     # unique will keep the first instance of a duplicate 
     # (i.e. the most recent child is kept) 
     out <- unique(append(children[!is.na(children)], out)) 

     # children become the parents of the next generation 
     parents <- children 
    } 
    return(out) 
} 
0

使用GO.db

的內部功能
GO_child <- function(node = "GO:0008150", ontology = "BP") { 
    #MF = "GO:0003674", node of MF 
    #BP = "GO:0008150", node of BP 
    #CC = "GO:0005575", node of CC 
    if (ontology == "BP") res <- GOBPOFFSPRING[[node]] 
    if (ontology == "CC") res <- GOCCOFFSPRING[[node]] 
    if (ontology == "MF") res <- GOMFOFFSPRING[[node]] 
    return(res) 
}