2016-09-08 26 views
2

有什麼方法可以找到多個時間線之間的差距。例如,我的數據如下所示:估計多個日期之間的差距

library(plyr);library(dplyr) 
library(googleVis) 

df <- data.frame(Language = structure(c(rep("English",7), rep("German",5), rep("French", 10)), class = "character"), 
       Students = c(LETTERS[1:7], LETTERS[1:5], LETTERS[1:10]), 
       Start = structure(c(16713,16713,16713,16744,16713,16714,16754,16729,16729,16729,16750,16769, 
            16724,16724,16745,16724,16759,16766,16723,16722,16736,16796), class = "Date"), 
       End = structure(c(16762,16720,16762,16755,16720,16764,16762,16765,16765,16749,16761,16770,16758, 
            16744,16758,16764,16765,16766,16726,16723,16758,16806), class = "Date")) 

ddply(df, .(Language), summarise, 
     FirstDay = min(Start), 
     LastDay = max(End), 
     Duration = LastDay - FirstDay) 

plot(gvisTimeline(data=df, rowlabel = "Class", start = "Start", end = "End", options=list(width=600, height=1000))) 

我是在計算了沒有學生上課的差距之後。下圖中的間隙以紅色突出顯示。

enter image description here

回答

4

這是一個相當典型的問題。關於這個的解決方案是根據開始日期是否大於以前的最大結束日期來過濾行,假定行按照開始日期排序。 lag功能cummax()可以用來找出以前的最大結束日期,而且由於cummax()不是爲Date類的定義,我們可以將其轉換爲整數,申請cummax然後將其轉換回:

library(dplyr) 
df %>% 
     arrange(Start) %>% group_by(Language) %>% 
     mutate(End_Max = lag(as.Date(cummax(as.integer(End)), "1970-01-01"))) %>% 
     filter(Start > End_Max + 1) %>% select(Language, End_Max, Start) 

# Source: local data frame [2 x 3] 
# Groups: Language [2] 

# Language End_Max  Start 
# <fctr>  <date>  <date> 
#1 German 2015-11-26 2015-11-30 
#2 French 2015-11-27 2015-12-27