2015-05-29 13 views
2

我有一組數據,其中包含患者在特定日期服用的藥物療程數。在R中的一個if操作後加起數字

subject<-c(111,111,111,222,222,333,333,333,333) 
date<-as.Date(c("2010-12-12","2011-12-01","2009-8-7","2010-5-7","2011-3-7","2011-8-5","2013-8-27","2016-9-3","2011-8-5")) 
medicationCourses<-c(1,0,NA,3,4,2,4,5,6) 

data<-data.frame(subject,date,medicationCourses) 

data 

    subject  date  medicationCourses 
1  111 2010-12-12    1 
2  111 2011-12-01    0 
3  111 2009-08-07    NA 
4  222 2010-05-07    3 
5  222 2011-03-07    4 
6  333 2011-08-05    2 
7  333 2013-08-27    4 
8  333 2016-09-03    5 
9  333 2011-08-05    6 

我也有他們的住院日期。

hospitalSubject<-c(111,222,333) 
admissionDate<-as.Date(c("2011-12-31","2013-12-31","2013-12-31")) 

hospitalData<-data.frame(hospitalSubject,admissionDate) 

hospitalData 

    hospitalSubject admissionDate 
1    111 2011-12-31 
2    222 2013-12-31 
3    333 2013-12-31 

我想總結的入院日期或之前的用藥課程的數量,併產生以下結果:

subject admissionDate totalMedicationCourses 
111   2011-12-31   1 
222   2013-12-31   7 
333   2013-12-31   12 

我不知道是否有人可以讓我知道我怎麼能在這樣做R'我是R的新手用戶,所以任何指導都將非常感謝!

回答

1

一種選擇將是merge兩個數據集由subject/hospitalSubject在兩個數據集,subset行與date <= admissionDate,並與aggregate

d1 <- subset(merge(data, hospitalData, by.x='subject', 
      by.y='hospitalSubject'), date <= admissionDate) 

aggregate(medicationCourses~subject+admissionDate, d1, sum, 
       na.rm=TRUE, na.action=NULL) 
# subject admissionDate medicationCourses 
#1  111 2011-12-31     1 
#2  222 2013-12-31     7 
#3  333 2013-12-31    12 

獲得的「medicationCourses」由「主題/ admissionDate」分組的sum或者我們可以通過將'data.frame'轉換爲'data.table'(setDT(data)),將該鍵設置爲'subject'(setkey(),並且加入hospitalData,使用data.table來篩選行date <= admissionDate並得到sum' medicationCourses',按照「subject」和「admissionDate」分組。

library(data.table) 
setkey(setDT(data), subject)[hospitalData][date <= admissionDate, 
    list(TotalMedicationCourses=sum(medicationCourses, na.rm=TRUE)), 
     list(subject, admissionDate)] 
# subject admissionDate TotalMedicationCourses 
#1:  111 2011-12-31      1 
#2:  222 2013-12-31      7 
#3:  333 2013-12-31      12 

或用dplyr

library(dplyr) 
    left_join(data, hospitalData, by=c('subject'='hospitalSubject')) %>% 
     filter(date <=admissionDate) %>% 
     group_by(subject, admissionDate) %>% 
     summarise(TotalMedicationCourses=sum(medicationCourses, na.rm=TRUE)) 
+1

類似的方法太感謝你了。並感謝其他發佈回覆但在幾分鐘前刪除它的用戶。他們都非常有幫助! –