我正在尋找一種方法來計算每個組ID的時差。這裏是我的數據的一部分:()來自dput輸出如何使用r或sql來計算每個組ID的差異?
ID road beginTime endTime Mon Tue Wed Thu Fri Sat
666 757 9:00 AM 11:45 AM S
555 758 1:55 PM 3:45 PM M W
555 759 10:40 AM 12:30 PM M W
555 760 4:00 PM 5:50 PM Tue R
444 761 3:00 PM 4:25 PM Tue R
444 762 4:30 PM 7:15 PM M
444 763 12:50 PM 2:40 PM Fri
444 764 10:40 AM 11:35 AM Tue R
222 765 11:45 AM 2:30 PM M W
222 766 6:00 PM 9:40 PM R
333 767 8:30 AM 11:15 AM M W
333 768 8:30 AM 11:15 AM Tue R
333 769 1:25 PM 2:50 PM Tue R
333 770 11:45 AM 1:10 PM M W
:
structure(list(ID = c(666L, 555L, 555L, 555L, 444L, 444L, 444L,
444L, 222L, 222L, 333L, 333L, 333L, 333L), road = 757:770, beginTime = structure(c(11L,
2L, 3L, 7L, 6L, 8L, 5L, 3L, 4L, 9L, 10L, 10L, 1L, 4L), .Label = c("1:25 PM",
"1:55 PM", "10:40 AM", "11:45 AM", "12:50 PM", "3:00 PM", "4:00 PM",
"4:30 PM", "6:00 PM", "8:30 AM", "9:00 AM"), class = "factor"),
endTime = structure(c(4L, 9L, 5L, 11L, 10L, 12L, 7L, 3L,
6L, 13L, 2L, 2L, 8L, 1L), .Label = c("1:10 PM", "11:15 AM",
"11:35 AM", "11:45 AM", "12:30 PM", "2:30 PM", "2:40 PM",
"2:50 PM", "3:45 PM", "4:25 PM", "5:50 PM", "7:15 PM", "9:40 PM"
), class = "factor"), Mon = structure(c(1L, 2L, 2L, 1L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L), .Label = c("", "M"), class = "factor"),
Tue = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 2L, 2L, 1L), .Label = c("", "Tue"), class = "factor"),
Wed = structure(c(1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L), .Label = c("", "W"), class = "factor"),
Thu = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L), .Label = c("", "R"), class = "factor"),
Fri = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("", "Fri"), class = "factor"),
Sat = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("", "S"), class = "factor")), .Names = c("ID",
"road", "beginTime", "endTime", "Mon", "Tue", "Wed", "Thu", "Fri",
"Sat"), class = "data.frame", row.names = c(NA, -14L))
每個ID開車在不同的時間不同的道路(公路)一天(BEGINTIME,結束時間)。我想計算每個ID的等待(非駕駛)時間。例如,週一和週三,ID = 555開車。第一階段是上午10點40分至下午12點30分。它等待了1.41小時,然後在1:55 - 3:45之間又開始了一段時間。 1.41小時的等待時間是我需要的。當週二和週四這個ID開車時,還有另一個等待時間。對於ID = 666,它只在週六開車一段時間,因此等待時間爲0.我的數據的困難是每個ID每天都有不同的時段。有什麼建議麼?非常感謝!
你可以'輸入'你的數據,所以我們可以測試? – 989
我建議你使用dayofweek將寬(平日)轉換爲長格式,或者如果你有超過一週的時間,則使用date來代替。你的領域將是'Id','road','beginTime','endTime','date'。從那裏,你可以更容易地使用像'aggregate'或'dplyr :: group_by'這樣的函數按日期/日期分組,然後使用lead或lag來找出組內行之間的時間。 – r2evans
@ r2evans,我以前曾試過這種方式。但是,例如,ID = 555每週驅車兩天,我怎麼能把兩天都放在一個「日期」列中? – user5843090