2010-06-01 42 views
5

任何人都知道如何利用ggplot或lattice來進行生存分析?這樣做會很好,可以做一個網格或類似生存圖。在ggplot或lattice中利用Surv對象


所以最後我打得周圍,那種發現的Kaplan-Meier曲線的解決方案。我很抱歉將列表元素放入一個數據框中,但我無法找到另一種方式。

注意:它只適用於兩層地層。如果有人知道我可以如何使用x<-length(stratum)來做到這一點,請讓我知道(在Stata中,我可以附加到宏 - 不確定這是如何在R中工作的)。

ggkm<-function(time,event,stratum) { 

    m2s<-Surv(time,as.numeric(event)) 

    fit <- survfit(m2s ~ stratum) 

    f$time <- fit$time 

    f$surv <- fit$surv 

    f$strata <- c(rep(names(fit$strata[1]),fit$strata[1]), 
      rep(names(fit$strata[2]),fit$strata[2])) 

    f$upper <- fit$upper 

    f$lower <- fit$lower 

    r <- ggplot (f, aes(x=time, y=surv, fill=strata, group=strata)) 
     +geom_line()+geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.3) 

    return(r) 
} 
+3

雷蒙Saccilotto寫了GGPLOT2教程,包括在用於GGPLOT2 KM函數作圖:http://www.ceb-institute.org/bbs/wp-content/uploads/2011/09/handout_ggplot2.pdf – MattBagg 2012-12-31 02:35:51

回答

4

我一直在使用下面的代碼lattice。第一函數繪製KM-曲線爲一組和將典型地被用作panel.group函數,而第二個用於整個面板增加了log-rank檢驗的p值:

km.panel <- function(x,y,type,mark.time=T,...){ 
    na.part <- is.na(x)|is.na(y) 
    x <- x[!na.part] 
    y <- y[!na.part] 
    if (length(x)==0) return() 
    fit <- survfit(Surv(x,y)~1) 
    if (mark.time){ 
     cens <- which(fit$time %in% x[y==0]) 
     panel.xyplot(fit$time[cens], fit$surv[cens], type="p",...) 
     } 
    panel.xyplot(c(0,fit$time), c(1,fit$surv),type="s",...) 
} 

logrank.panel <- function(x,y,subscripts,groups,...){ 
    lr <- survdiff(Surv(x,y)~groups[subscripts]) 
    otmp <- lr$obs 
    etmp <- lr$exp 
    df <- (sum(1 * (etmp > 0))) - 1 
    p <- 1 - pchisq(lr$chisq, df) 
    p.text <- paste("p=", signif(p, 2)) 
    grid.text(p.text, 0.95, 0.05, just=c("right","bottom")) 
    panel.superpose(x=x,y=y,subscripts=subscripts,groups=groups,...) 
} 

的截尾指示符必須是0-1這個代碼工作。用法如下:

library(survival) 
library(lattice) 
library(grid) 
data(colon) #built-in example data set 
xyplot(status~time, data=colon, groups=rx, panel.groups=km.panel, panel=logrank.panel) 

如果您只是使用'panel = panel.superpose',那麼您將無法獲得p值。

1

我開始遵循你在更新的答案中使用的方法。但是,對於生存期的惱人之處在於,它只是標誌着變化,而不是每個勾號 - 例如,它會給你0 - 100%,3 - 88%而不是0 - 100%,1 - 100%,2 - 100 %,3-88%。如果你把它加入到ggplot中,你的線條將從0到3傾斜,而不是保持平坦,直接下降到3。這可能是好的,取決於你的應用和假設,但它不是經典的KM陰謀。這是我如何處理階層數的變化:

groupvec <- c() 
for(i in seq_along(x$strata)){ 
    groupvec <- append(groupvec, rep(x = names(x$strata[i]), times = x$strata[i])) 
} 
f$strata <- groupvec 

對於它的價值,這是我最後只是 - 但這是不是一個真正的KM的情節,或者說,是因爲我沒有計算(雖然我沒有審查,所以這相當於......我相信)。

survcurv <- function(surv.time, group = NA) { 
    #Must be able to coerce surv.time and group to vectors 
    if(!is.vector(as.vector(surv.time)) | !is.vector(as.vector(group))) {stop("surv.time and group must be coercible to vectors.")} 

    #Make sure that the surv.time is numeric 
    if(!is.numeric(surv.time)) {stop("Survival times must be numeric.")} 

    #Group can be just about anything, but must be the same length as surv.time 
    if(length(surv.time) != length(group)) {stop("The vectors passed to the surv.time and group arguments must be of equal length.")} 

    #What is the maximum number of ticks recorded? 
    max.time <- max(surv.time) 

    #What is the number of groups in the data? 
    n.groups <- length(unique(group)) 

    #Use the number of ticks (plus one for t = 0) times the number of groups to 
    #create an empty skeleton of the results. 
    curves <- data.frame(tick = rep(0:max.time, n.groups), group = NA, surv.prop = NA) 

    #Add the group names - R will reuse the vector so that equal numbers of rows 
    #are labeled with each group. 
    curves$group <- unique(group) 

    #For each row, calculate the number of survivors in group[i] at tick[i] 
    for(i in seq_len(nrow(curves))){ 
     curves$surv.prop[i] <- sum(surv.time[group %in% curves$group[i]] > curves$tick[i])/
      length(surv.time[group %in% curves$group[i]]) 
    } 

    #Return the results, ordered by group and tick - easier for humans to read. 
    return(curves[order(curves$group, curves$tick), ]) 

}