2015-09-26 30 views
2

我想要將特定日期的天數與變量Id一起作爲「中斷變量」,並將結果作爲新的柱。我希望結果類似於數據幀RESULT中的結果。計算特定日期的ID作爲「中斷變量」

我正在收集有關患者進展的數據(Variable_x),我想在混合模型中使用「天數」變量作爲時間變量。

下面是變量:

Id <- c(1,1,1,1,2,2,2,5,5,5,5,5) 
Date <- as.Date (c("2015-01-01", "2015-01-10", "2015-01-15","2015-01-25","2013-02-01", "2013-03-20", "2013-04-03","2014-05-06","2014-06-07","2014-06-08","2014-08-09","2014-10-10")) 
Variable_x <- c("70","NA","55", "30", "70", "60", "NA", "80", "60", "70", "50","20") 
Days <- c(0,9,14,24,0,47,61,0,32,33,95,157) 

下面是數據我有:

DATA <- data.frame(Id, Date, Variable_x) 

這裏是我想要的數據:

RESULT <- data.frame(Id, Date, Days, Variable_x) 

希望有人能想出答案或指向正確的方向。

幫助將不勝感激。

回答

2

的選項使用data.table。我們將'data.frame'轉換爲'data.table'(setDT(DATA)),按'Id'分組,我們得到'Date'和'Date'的lagshift默認爲type=lagcumsum並分配(:=)輸出以創建'天'列。

library(data.table)#v1.9.6+ 
setDT(DATA)[, Days:=cumsum(as.numeric(Date-shift(Date, fill=Date[1L]))), Id] 
DATA 
# Id  Date Variable_x Days 
# 1: 1 2015-01-01   70 0 
# 2: 1 2015-01-10   NA 9 
# 3: 1 2015-01-15   55 14 
# 4: 1 2015-01-25   30 24 
# 5: 2 2013-02-01   70 0 
# 6: 2 2013-03-20   60 47 
# 7: 2 2013-04-03   NA 61 
# 8: 5 2014-05-06   80 0 
# 9: 5 2014-06-07   60 32 
#10: 5 2014-06-08   70 33 
#11: 5 2014-08-09   50 95 
#12: 5 2014-10-10   20 157 
2

您可能在尋找diff與R的許多分組功能之一相結合。

下面是與 「dplyr」 的例子:

library(dplyr) 
DATA %>% 
    group_by(Id) %>% 
    mutate(Days = cumsum(c(0, diff(Date)))) 
# Source: local data frame [12 x 4] 
# Groups: Id [3] 
# 
#  Id  Date Variable_x Days 
# (dbl)  (date)  (fctr) (dbl) 
# 1  1 2015-01-01   70  0 
# 2  1 2015-01-10   NA  9 
# 3  1 2015-01-15   55 14 
# 4  1 2015-01-25   30 24 
# 5  2 2013-02-01   70  0 
# 6  2 2013-03-20   60 47 
# 7  2 2013-04-03   NA 61 
# 8  5 2014-05-06   80  0 
# 9  5 2014-06-07   60 32 
# 10  5 2014-06-08   70 33 
# 11  5 2014-08-09   50 95 
# 12  5 2014-10-10   20 157