我有一個自行車軌跡的樣本數據集。我的目標是要弄清楚,平均的時間量,在訪問B站間的失誤下一條記錄的索引
到目前爲止,我已經能夠簡單地訂購數據集:
test[order(test$starttime, decreasing = FALSE),]
,並找到哪裏start_station
和end_station
相等B.
which(test$start_station == 'B')
which(test$end_station == 'B')
接下來的部分是,我遇到麻煩了行索引。爲了計算的時間流逝中,當自行車是在站B之間,我們必須在那裏start_station = "B"
(自行車葉)之間的difftime()
和下一個出現的記錄其中end_station= "B"
,即使記錄恰好是在同一行(見第6行)。
用下面的數據集,我們知道,自行車7:30:00
和16:00:00
外站B和18:00:00
以30分鐘18:30:00
外站的B,19:00:00
210之間分鐘,22:30:00
外站的B,之間花了510分鐘這平均值爲250 minutes.
如何使用difftime()
在R中重現此輸出?
> test
bikeid start_station starttime end_station endtime
1 1 A 2017-09-25 01:00:00 B 2017-09-25 01:30:00
2 1 B 2017-09-25 07:30:00 C 2017-09-25 08:00:00
3 1 C 2017-09-25 10:00:00 A 2017-09-25 10:30:00
4 1 A 2017-09-25 13:00:00 C 2017-09-25 13:30:00
5 1 C 2017-09-25 15:30:00 B 2017-09-25 16:00:00
6 1 B 2017-09-25 18:00:00 B 2017-09-25 18:30:00
7 1 B 2017-09-25 19:00:00 A 2017-09-25 19:30:00
8 1 А 2017-09-25 20:00:00 C 2017-09-25 20:30:00
9 1 C 2017-09-25 22:00:00 B 2017-09-25 22:30:00
10 1 B 2017-09-25 23:00:00 C 2017-09-25 23:30:00
這裏是樣本數據:
> dput(test)
structure(list(bikeid = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), start_station = c("A",
"B", "C", "A", "C", "B", "B", "А", "C", "B"), starttime = structure(c(1506315600,
1506339000, 1506348000, 1506358800, 1506367800, 1506376800, 1506380400,
1506384000, 1506391200, 1506394800), class = c("POSIXct", "POSIXt"
), tzone = ""), end_station = c("B", "C", "A", "C", "B", "B",
"A", "C", "B", "C"), endtime = structure(c(1506317400, 1506340800,
1506349800, 1506360600, 1506369600, 1506378600, 1506382200, 1506385800,
1506393000, 1506396600), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("bikeid",
"start_station", "starttime", "end_station", "endtime"), row.names = c(NA,
-10L), class = "data.frame")
第一步將轉換爲長格式,如'library(data.table); mtest = melt(setDT(test),id =「bikeid」,meas = patterns(「_ station」,「time」), variable.name =「event」,value.name = c(「station」,「time」 )); (factor:(1:2),c(「start」,「end」)),on =。(event),event:= i.V2]; 'setkey(mtest,bikeid,time)',但我不確定之後的最佳方式。 – Frank