2015-04-24 19 views
3

我有如下數據集:如何匹配R中的條件邏輯迴歸的患者數據?

patient_id pre.int.outcome post.int.outcome 
    302949    1   1    
    993564    0   1   
    993570    1   1 
    993575    0   1  
    993792    1   0  

我要爲每一個病人

我明白,我需要把它搬進形式進行clogit前/後干預:

strata   outcome 
    1     1 
    1     1 
    2     0 
    2     0 
    3     0 
    3     1 

在這種形式中,分層是患者數量和結果對,但我不知道如何去做。任何人都可以幫助或指導一個有用的來源嗎?

編輯:我最終做的是使用重塑函數使數據集'長'而不是寬;

ds1<-reshape(ds, varying=c('pre.int.outcome','post.int.outcome'), v.names='outcome', timevar='before_after', times=c(0,1), direction='long') 

我按patient_id排序使用它作爲我的'分層'。

ds1[order(ds1$patient_id),] 
+1

可以使用庫中的'melt'(reshape2),即'melt(df1,id.var ='patient_id')[ - 2]' – akrun

回答

4

可能這有助於

data.frame(strata= rep(1:nrow(df1), each=2), outcome=c(t(df1[2:3]))) 
2

大廈在akrun的評論和答案,這裏是一個使用reshape2包的melt的解決方案:

library(reshape2) 

# I created dummy data to make sure my answer works 
# I assumed 4 intervention treatments, but this would work with 
# two treatments. With the dummy data, just make sure nObs/4 is an integer 
nObs = 100 # number of observations 
d = data.frame(patient_id = 1:4, 
      pre.int.outcome = rbinom(4, 1, 0.7), 
      post.int.outcome = rbinom(4, 1, 0.5), 
      intervention = rep(c("a", "b", "c", "d"), each = nObs/4)) 
# melting the data as suggested by akrun 
d2 = melt(d, id.vars = c("patient_id", "intervention")) 

# Creating a strata variable for you with paste 
d2$strata = as.factor(paste(d2$patient_id, d2$variable)) 
# I also clean up the variable to remove patient_id 
# useful if you are concerned about protecting pii 
levels(d2$strata) = 1:length(d2$strata) 
# last, I clean up the data and create a third "pretty" data.frame 
d3 = d2[ , c("intervention", "value", "strata")] 
head(d3) 
# intervention value strata 
# 1   a  1  2 
# 2   a  1  4 
# 3   a  1  6 
# 4   a  1  8 
# 5   a  1  2 
# 6   a  1  4 
# I also throw in the logistic regression 
myGLM = glm(value ~ intervention, data = d3, family = 'binomial') 
summary(myGLM) 
# prints lots of outputs to screen ... 

# or if you need odds ratios 
myGLM2 = glm(value ~ intervention - 1, data = d3, family = 'binomial') 
exp(myGLM2$coef) 
exp(confint(myGLM2)) 
# also prints lots of outputs to screen ... 

編輯:intervention添加基於根據OP的意見。我還添加了glm以進一步幫助她或他。

+1

另外,[「pii」](https://en.wikipedia。 org/wiki/Personally_identifiable_information)是個人識別信息。我意識到大多數人可能不知道這個縮寫詞。如果您想了解更多信息,請參閱鏈接的Wikipeida文章。 –

+1

有沒有辦法在你的解決方案中包含'pre'和'post'作爲行值,即:價值,階層和干預? – user1745691

+1

是的。該列從虛擬中丟失,數據是否正確?假設列名是'intervention',你可以將'melt'改爲'melt(d,id.vars = c(「patient_id」,「intervention」)'我會編輯我的答案以包含它 –