2014-04-14 22 views
0

我有一個數據幀(DF),與4列:如何識別時間重疊於與患者住院和出院日期 - [R

ID,admitDate(如日期),dcDate(如日期),洛(停留時間在 天)。

$ admitDate : Date, format: "2009-09-19" "2010-01-24" "2010-09-30" ... 
$ dcDate  : Date, format: "2009-09-23" "2010-01-27" "2010-10-04" ... 
$ los  : num 4 3 4 25 6 3 6 2 2 3 ... 

我需要能夠告訴在任何給定的時間有多少病人(和患者)被錄取。也就是說,我想我需要找出病人的洛杉磯之間的重疊。下面是我如何定義重疊:(DF $ admitDate [X] < = DF $ disDate [Y])&(DF $ admitDate [Y] < = DF $ disDate [X])

任何幫助是多少讚賞。

這裏是dput的前20例患者的輸出:

> dput(head(df,20)) 
structure(list(Unit.Number = c(2013459L, 2013459L, 2047815L, 
1362858L, 1331174L, 2068040L, 1363711L, 2175972L, 2036695L, 1426614L, 
1403126L, 2083126L, 1334063L, 1349385L, 1404482L, 2175545L, 1296600L, 
1293220L, 1336768L, 2148401L), admitDate = structure(c(14506, 
14633, 14882, 15172, 14945, 15632, 15482, 15601, 16096, 15843, 
16013, 15548, 15436, 15605, 16115, 15597, 15111, 15050, 15500, 
15896), class = "Date"), dcDate = structure(c(14510, 14636, 14886, 
15197, 14951, 15635, 15488, 15603, 16098, 15846, 16016, 15552, 
15438, 15606, 16118, 15598, 15113, 15058, 15501, 15915), class = "Date"), 
los = c(4, 3, 4, 25, 6, 3, 6, 2, 2, 3, 3, 4, 2, 1, 3, 1, 
2, 8, 1, 19)), .Names = c("Unit.Number", "admitDate", "dcDate", 
"los"), row.names = c(NA, 20L), class = "data.frame") 

首先,我試圖通過G.格羅騰迪克建議代碼:

days <- seq(min(df$admitDate), max(df$dcDate), "day") 
no.patients <- data.frame(
    Date = days, 
    Num = sapply(days, function(d) sum(d >= df$admitDate & d <= df$dcDate)), 
    Patients = sapply(days, function(d) 
     toString(df$Unit.Number[d >= df$admitDate & d <= df$dcDate])) 
) 

這裏是發生了什麼事:

> days <- seq(min(df$admitDate), max(df$dcDate), "day") 
Error in seq.int(0, to0 - from, by) : 'to' cannot be NA, NaN or infinite 
> no.patients <- data.frame(Date = d, 
+       Num = sapply(days, function(d) sum(d >= df$admitDate & d <=   df$dcDate))) 
Error in data.frame(Date = d, Num = sapply(days, function(d) sum(d >= : 
object 'd' not found 

然後,我想也許我需要擺脫NA的。所以這裏是我做的:

> df <- df[rowSums(is.na(df)) < 0, ] 

然後再試一次。以下是我的了:

> days <- seq(min(df$admitDate), max(df$dcDate), "day") 
Error in seq.int(0, to0 - from, by) : 'to' cannot be NA, NaN or infinite 
In addition: Warning messages: 
1: In min.default(numeric(0), na.rm = FALSE) : 
no non-missing arguments to min; returning Inf 
2: In max.default(numeric(0), na.rm = FALSE) : 
no non-missing arguments to max; returning -Inf 
> no.patients <- data.frame(Date = d, 
+       Num = sapply(days, function(d) sum(d >= df$admitDate & d <= df$dcDate))) 
Error in data.frame(Date = d, Num = sapply(days, function(d) sum(d >= : 
object 'd' not found 
+0

請用'dput'作爲例子顯示足夠的數據。 –

+0

當我嘗試從df中剪切和粘貼時,它看起來無法理解,所有內容都遵循彼此,而不是行和列。正如你所看到的,我是所有這些的新手。 – user3399918

+0

'dput'的目的是讓那些回答問題的人可以簡單地複製你的輸出並將其粘貼回到他們的會話中以完全複製它。請參閱:http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

回答

0

這是另一種方式。這是將創建基於進入/退出倍,隊列的大小,並且可以在這種情況下,被用來計算的病人數目的過程:

df <- structure(list(Unit.Number = c(2013459L, 2013459L, 2047815L, 
1362858L, 1331174L, 2068040L, 1363711L, 2175972L, 2036695L, 1426614L, 
1403126L, 2083126L, 1334063L, 1349385L, 1404482L, 2175545L, 1296600L, 
1293220L, 1336768L, 2148401L), admitDate = structure(c(14506, 
14633, 14882, 15172, 14945, 15632, 15482, 15601, 16096, 15843, 
16013, 15548, 15436, 15605, 16115, 15597, 15111, 15050, 15500, 
15896), class = "Date"), dcDate = structure(c(14510, 14636, 14886, 
15197, 14951, 15635, 15488, 15603, 16098, 15846, 16016, 15552, 
15438, 15606, 16118, 15598, 15113, 15058, 15501, 15915), class = "Date"), 
los = c(4, 3, 4, 25, 6, 3, 6, 2, 2, 3, 3, 4, 2, 1, 3, 1, 
2, 8, 1, 19)), .Names = c("Unit.Number", "admitDate", "dcDate", 
"los"), row.names = c(NA, 20L), class = "data.frame") 

# create dataframe for computing the size of the queue (concurrent patients) 
x <- data.frame(date = c(df$admitDate, df$dcDate) 
      , op = c(rep(1, nrow(df)), rep(-1, nrow(df))) 
      , Unit.Number = c(df$Unit.Number, df$Unit.Number) 
      ) 
# sort and calculate concurrent patients 
x <- x[order(x$date), ] # sort in time order 
x$cum <- cumsum(x$op) 

# 'x' will have the 'cum' equal to the number of patients concurrently. 
# for 'op' == 1, you have the patient ID and 'cum' will be the number of 
# patients at that time. 

plot(x$date, x$cum, type = 's') 

這是「x」的外觀的第一部分如:

> head(x,10) 
     date op Unit.Number cum 
1 2009-09-19 1  2013459 1 
21 2009-09-23 -1  2013459 0 
2 2010-01-24 1  2013459 1 
22 2010-01-27 -1  2013459 0 
3 2010-09-30 1  2047815 1 
23 2010-10-04 -1  2047815 0 
5 2010-12-02 1  1331174 1 
25 2010-12-08 -1  1331174 0 
18 2011-03-17 1  1293220 1 
38 2011-03-25 -1  1293220 0 
> 
+0

這就像一個魅力。非常感謝。現在我要研究代碼,希望能夠理解並從中學習。 – user3399918

1

試試這個:

days <- seq(min(df$admitDate), max(df$dcDate), "day") 
no.patients <- data.frame(
     Date = days, 
     Num = sapply(days, function(d) sum(d >= df$admitDate & d <= df$dcDate)), 
     Patients = sapply(days, function(d) 
      toString(df$Unit.Number[d >= df$admitDate & d <= df$dcDate])) 
) 

,並提供:

> head(no.patients) 
     Date Num Patients 
1 2009-09-19 1 2013459 
2 2009-09-20 1 2013459 
3 2009-09-21 1 2013459 
4 2009-09-22 1 2013459 
5 2009-09-23 1 2013459 
6 2009-09-24 0   

ADDED患者名單,以每行。固定案例df

+0

我試過了。見上面(原始問題)閱讀我得到的錯誤消息。 – user3399918

+0

我已經在一個時間更新了答案,我想你在我的修改之前抓住了它。我已經添加了輸出的前幾行,以表明它對所提供的示例數據有效。如果您仍然遇到錯誤,那麼您將需要提供一個可重現的示例來顯示它們。 –

+0

刪除NA後,它工作得很好,結果-plot(no.patient $ Date,no.patient $ Num) - 與下面提出的建議相同。許多tx。 – user3399918

相關問題