2016-03-15 42 views
0

我有處方記錄數據,並想知道每個人每年從發行日期到其記錄結束時有多少處方。實施例的數據(第5行的每個ID的):如何創建連續列(R)

 ID Issue_Date index.date other.drugs 
    1: 1 2000-02-08 2011-02-03   1 
    2: 1 2000-04-04 2011-02-03   0 
    3: 1 2000-05-30 2011-02-03   1 
    4: 1 2000-07-25 2011-02-03   1 
    5: 1 2000-08-22 2011-02-03   1 
---          
    1: 2 2007-03-23 2009-04-03   1 
    2: 2 2007-04-04 2009-04-03   1 
    3: 2 2007-04-23 2009-04-03   1 
    4: 2 2007-04-23 2009-04-03   0 
    5: 2 2007-05-21 2009-04-03   1 

other.drugs列是一個指示變量表示在該日期給出的處方是否不在研究興趣的處方。 index.date是他們進入研究的日期。有超過1000個ID的,只有2個在這裏給出。

我想每年在issue.date之後找到每年other.drugs的總和。我分別使用下面的代碼計算出這第一年:

dt <- dt[, yearend.1 := Issue_Date[1]+365, by = ID] 
dt <- dt[(Issue_Date<=yearend.1), comorbid.1 := sum(other.drugs), by = ID] 
dt <- dt[, comorbid.1:= comorbid.1[!is.na(comorbid.1)][1], by = ID] 
# the last line copies the value to each cell the ID occupies in the data.table for that column instead of having NA's 

這給了以下結果:

 ID Issue_Date index.date other.drugs yearend.1 comorbid.1 
    1: 1 2000-02-08 2011-02-03   1 2001-02-07   8 
    2: 1 2000-04-04 2011-02-03   1 2001-02-07   8 
    3: 1 2000-05-30 2011-02-03   1 2001-02-07   8 
    4: 1 2000-07-25 2011-02-03   1 2001-02-07   8 
    5: 1 2000-08-22 2011-02-03   1 2001-02-07   8 
--- 
    1: 2 2007-03-23 2009-04-03   1 2008-03-22   30 
    2: 2 2007-04-04 2009-04-03   1 2008-03-22   30 
    3: 2 2007-04-23 2009-04-03   1 2008-03-22   30 
    4: 2 2007-04-23 2009-04-03   1 2008-03-22   30 
    5: 2 2007-05-21 2009-04-03   1 2008-03-22   30 

解讀:ID 1後,他們的第一個issue_date規定在今年8種其他藥物和ID 2遵醫囑30.

多年來2-10(有一個最大爲11年的記錄)我寫下面的循環:

years <- seq(730, 3650, 365) 
# number of days in 2-10 years. 
years2 <- seq(2,10,1) 
# numbering the years for column names 
colnames <- paste0("yearend.", years2) 
colnames2 <- paste0("comorbid.", years2) 
# names of columns to be used 

for (i in 1:length(years)) { 
    dt <- dt[, colnames[i] := Issue_Date[1]+years[i], by = ID] 
    dt <- dt[(Issue_Date>=(as.Date(colnames[i], "%d-%m-%Y")) & Issue_Date<(as.Date(colnames[i+1], "%d-%m-%Y"))), 
     colnames2[i] := sum(other.drugs), by = ID] 
    dt <- dt[, colnames2[i]:= colnames2[i][!is.na(colnames2[i])][1], by = ID] 
} 

但是應該已經創造了新的欄目有:

 ID Issue_Date index.date other.drugs yearend.1 comorbid.1 yearend.2 comorbid.2 yearend.3 comorbid.3 
    1: 1 2000-02-08 2011-02-03   1 2001-02-07   8 2002-02-07 comorbid.2 2003-02-07 comorbid.3 
    2: 1 2000-04-04 2011-02-03   1 2001-02-07   8 2002-02-07 comorbid.2 2003-02-07 comorbid.3 
    3: 1 2000-05-30 2011-02-03   1 2001-02-07   8 2002-02-07 comorbid.2 2003-02-07 comorbid.3 
    4: 1 2000-07-25 2011-02-03   1 2001-02-07   8 2002-02-07 comorbid.2 2003-02-07 comorbid.3 
    5: 1 2000-08-22 2011-02-03   1 2001-02-07   8 2002-02-07 comorbid.2 2003-02-07 comorbid.3 
--- 

我想知道什麼是我的循環去錯了。非常感謝幫助。

回答

1

無論何時需要在data.table中使用實際來自R中變量的列名稱,都需要使用get。因此,你應該重寫你的循環這樣,

for (i in 1:length(years)) { 
    dt <- dt[, colnames[i] := Issue_Date[1]+years[i], by = ID] 
    dt <- dt[(Issue_Date>=(as.Date(get(colnames[i]), "%d-%m-%Y")) & Issue_Date<(as.Date(get(colnames[i+1]), "%d-%m-%Y"))), 
     colnames2[i] := sum(other.drugs), by = ID] 
    dt <- dt[, colnames2[i]:= get(colnames2[i])[!is.na(get(colnames2[i]))][1], by = ID] 
} 

我無法實際測試你的代碼,因爲它是,因爲我有2個問題:

  • 我沒有足夠的數據,以便我會得到任何東西從你的暫時的情況Issue_Date>...
  • 也許我錯過了一些東西,但在你的循環中,你試圖使用colnames[i+1],即yearend.X實際上被創建之前(也許你已經跑了好幾次,這就是爲什麼你不要沒有錯誤?)

我做了這樣的事情來測試它,當然的comorbid.2值沒有任何意義:

dt 
    ID Issue_Date index.date other.drugs yearend.1 comorbid.1 
1: 1 00-02-08 2011-02-03   1 01-02-07   4 
2: 1 00-04-04 2011-02-03   0 01-02-07   4 
3: 1 00-05-30 2011-02-03   1 01-02-07   4 
4: 1 00-07-25 2011-02-03   1 01-02-07   4 
5: 1 00-08-22 2011-02-03   1 01-02-07   4 
6: 2 07-03-23 2009-04-03   1 08-03-22   4 
7: 2 07-04-04 2009-04-03   1 08-03-22   4 
8: 2 07-04-23 2009-04-03   1 08-03-22   4 
9: 2 07-04-23 2009-04-03   0 08-03-22   4 
10: 2 07-05-21 2009-04-03   1 08-03-22   4 

i <- 1 
dt <- dt[, colnames[i] := Issue_Date[1]+years[i], by = ID] 
dt <- dt[Issue_Date<get(colnames[i]), 
     colnames2[i] := sum(other.drugs), by = ID] 
dt <- dt[, colnames2[i]:= get(colnames2[i])[!is.na(get(colnames2[i]))][1], by = ID] 

dt 
    ID Issue_Date index.date other.drugs yearend.1 comorbid.1 yearend.2 comorbid.2 
1: 1 00-02-08 2011-02-03   1 01-02-07   4 02-02-07   4 
2: 1 00-04-04 2011-02-03   0 01-02-07   4 02-02-07   4 
3: 1 00-05-30 2011-02-03   1 01-02-07   4 02-02-07   4 
4: 1 00-07-25 2011-02-03   1 01-02-07   4 02-02-07   4 
5: 1 00-08-22 2011-02-03   1 01-02-07   4 02-02-07   4 
6: 2 07-03-23 2009-04-03   1 08-03-22   4 09-03-22   4 
7: 2 07-04-04 2009-04-03   1 08-03-22   4 09-03-22   4 
8: 2 07-04-23 2009-04-03   1 08-03-22   4 09-03-22   4 
9: 2 07-04-23 2009-04-03   0 08-03-22   4 09-03-22   4 
10: 2 07-05-21 2009-04-03   1 08-03-22   4 09-03-22   4 

希望它能幫助。