2017-02-01 44 views
4

我有兩組數據:尋找最匹配的時間爲每一個病人

第一組:

patient<-c("A","A","B","B","C","C","C","C") 
arrival<-c("11:00","11:00","13:00","13:00","14:00","14:00","14:00","14:00") 
lastRow<-c("","Yes","","Yes","","","","Yes") 

data1<-data.frame(patient,arrival,lastRow) 

另一組數據:

patient<-c("A","A","A","A","B","B","B","C","C","C") 
availableSlot<-c("11:15","11:35","11:45","11:55","12:55","13:55","14:00","14:00","14:10","17:00") 

data2<-data.frame(patient, availableSlot) 

我要創建添加一列到第一個數據集,使得對於每個患者的每個最後一行,它顯示最接近到達時間的可用插槽:

結果將是:我怎麼能在R.實現這個

patient arrival lastRow availableSlot 
     A 11:00   
     A 11:00  Yes  11:15 
     B 13:00   
     B 13:00  Yes  12:55 
     C 14:00   
     C 14:00   
     C 14:00   
     C 14:00  Yes  14:00 

希望如果任何人都可以告訴我

+0

你可能需要將自己列與小時Date類第一 – Cath

+0

所以這個轉換工作?合併(data1,data2 [!(data2 $ patient),],by ='patient')'假設您的'data2'是由'arrival'命令的 – Sotos

+2

@Sotos不,那真是太幸運了。 OP表示「最接近」,這恰好與第一次一致。 – Frank

回答

8

我會用data.table,首先通過轉換爲ITIME而忽視清理冗餘行:

library(data.table) 
setDT(data1)[, arrival := as.ITime(as.character(arrival))] 
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))] 
DT1 = unique(data1, by="patient", fromLast=TRUE) 

然後,你可以做一個 「滾動加盟」:

res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest", 
    .(patient, availableSlot = x.availableSlot)] 

# patient availableSlot 
# 1:  A  11:15:00 
# 2:  B  12:55:00 
# 3:  C  14:00:00 

它是如何工作的

語法是x[i, on=, roll=, j]

  • on=是合併列。
  • 這是一個連接:對於i的每一行,我們正在尋找匹配x
  • 隨着roll="nearest"on=的最後一列被「滾動」到最接近的匹配。
  • 原始表中的on=列可以引用x.*i.*前綴。
  • j參數應給出列的列表,.()list()的別名。

查看包裝的介紹材料http://r-datatable.com/Getting-started和類型?data.table查看與滾動連接相關的文檔。


我將停止在res,但如果你真的想要回你的原始表...

# a very nonstandard step: 
data1[lastRow == "Yes", availableSlot := res$availableSlot ] 

# patient arrival lastRow availableSlot 
# 1:  A 11:00:00     <NA> 
# 2:  A 11:00:00  Yes  11:15:00 
# 3:  B 13:00:00     <NA> 
# 4:  B 13:00:00  Yes  12:55:00 
# 5:  C 14:00:00     <NA> 
# 6:  C 14:00:00     <NA> 
# 7:  C 14:00:00     <NA> 
# 8:  C 14:00:00  Yes  14:00:00 

現在,data1在新列availableSlot,類似的,當你做data1$col <- val到。

+3

輝煌的解釋+1! –

+1

一如既往的好解釋 – akrun

+0

非常感謝您的幫助。我可以問一下x.availableSlot中的x是什麼? –

1

這裏是一個解決方案(基於joel.wilson's answer我的問題),將與基地R

#Convert dates to POSIXct format 
data1$arrival = as.POSIXct(data1$arrival, format = "%H:%M") 
data2$availableSlot = as.POSIXct(data2$availableSlot, format = "%H:%M") 

#Lookup times from data2$availableSlot closest to data1$arrival 
data1$availableSlot = sapply(data1$arrival, function(x) 
        data2$availableSlot[which.min(abs(x - data2$availableSlot))]) 

#Keep just hour and minutes 
data1$availableSlot = strftime(as.POSIXct(data1$availableSlot, 
           origin = "1970-01-01"), format = "%H:%M") 
data1$arrival = strftime(as.POSIXct(data1$arrival), format = "%H:%M") 

#Remove times when lastrow is empty 
data1$availableSlot[which(data1$lastRow != "Yes")] = ""