我想進行通路富集分析。 我有21個重要基因的列表,以及我想檢查的多種類型的通路(即檢查KEGG通路,GOterms,複合體等的富集)。解讀R代碼功能
我發現這個代碼例子,在一箇舊BIOC崗位。但是,我無法適應自己。
首先, 1這是什麼意思?我不知道這個多重冒號語法。
hyperg <- Category:::.doHyperGInternal
2 - 我不明白這條線是如何工作的。 hyperg.test是一個需要傳遞給它3個變量的函數,對嗎?這是行不知何故通過「genes.by.pathways,significant.genes和all.geneIDs到THR hyperg.test?
,我想pVals.by.pathway<-t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))
代碼以適應
library(KEGGREST)
library(org.Hs.eg.db)
# created named list, length 449, eg:
# path:hsa00010: "Glycolysis/Gluconeogenesis"
pathways <- keggList("pathway", "hsa")
# make them into KEGG-style human pathway identifiers
human.pathways <- sub("path:", "", names(pathways))
# for demonstration, just use the first ten pathways
demo.pathway.ids <- head(human.pathways, 10)
demo.pathways <- setNames(keggGet(demo.pathway.ids), demo.pathway.ids)
genes.by.pathway <- lapply(demo.pathways, function(demo.pathway) {
demo.pathway$GENE[c(TRUE, FALSE)]
})
all.geneIDs <- keys(org.Hs.eg.db)
# chose one of these for demonstration. the first (a whole genome random
# set of 100 genes) has very little enrichment, the second, a random set
# from the pathways themselves, has very good enrichment in some pathways
set.seed(123)
significant.genes <- sample(all.geneIDs, size=100)
#significant.genes <- sample(unique(unlist(genes.by.pathway)), size=10)
# the hypergeometric distribution is traditionally explained in terms of
# drawing a sample of balls from an urn containing black and white balls.
# to keep the arguments straight (in my mind at least), I use these terms
# here also
hyperg <- Category:::.doHyperGInternal
hyperg.test <-
function(pathway.genes, significant.genes, all.genes, over=TRUE)
{
white.balls.drawn <- length(intersect(significant.genes, pathway.genes))
white.balls.in.urn <- length(pathway.genes)
total.balls.in.urn <- length(all.genes)
black.balls.in.urn <- total.balls.in.urn - white.balls.in.urn
balls.pulled.from.urn <- length(significant.genes)
hyperg(white.balls.in.urn, black.balls.in.urn,
balls.pulled.from.urn, white.balls.drawn, over)
}
pVals.by.pathway <-
t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))
print(pVals.by.pathway)
是像您期望或代碼行爲不,你只是想只是更好地理解它?目前還不清楚你是否有問題或者只是想要信息。 – cdeterman 2014-11-24 13:43:19
@cdeterman - 兩個,我真的不明白,尤其是最後一行,這使得它很難改變它,我的「M收到此錯誤> pVals.by.pathway < - + T(sapply(基因。 by.pathway,hyperg.test,significant.genes,all.geneIDs)) FUN(X [[1L]],...的錯誤):找不到函數「hyperg」 – user2814482 2014-11-24 13:54:37