0
我有(片斷)這種格式的數據:解圈的R代碼重疊的時間間隔計算
SW_Release deviceType configStartDate configEndDate
1: 04.05.00 21 2005-11-03 19:12:36 2006-02-28 10:19:27
2: 04.05.00 16 2005-11-04 03:59:05 2006-02-28 10:19:27
3: 04.05.00 20 2005-11-04 03:59:06 2006-02-28 10:19:27
4: 04.05.00 15 2005-11-04 03:59:06 2006-02-28 10:19:27
5: 04.05.00 19 2005-11-04 03:59:06 2006-02-28 10:19:27
6: 04.05.00 17 2005-11-04 03:59:06 2006-02-28 10:19:27
7: 04.07.03 16 2006-02-28 10:19:27 2006-03-29 01:00:39
8: 04.07.03 20 2006-02-28 10:19:27 2006-03-29 01:00:41
9: 04.07.01 15 2006-02-28 10:19:27 2006-03-29 01:00:41
10: 04.07.01 19 2006-02-28 10:19:27 2006-03-29 01:00:41
11: 04.07.01 17 2006-02-28 10:19:27 2006-03-29 01:00:42
12: 04.07.01 21 2006-02-28 10:19:27 2006-03-29 01:00:42
13: 04.07.01 18 2006-02-28 10:19:27 2006-03-29 01:00:42
14: 04.07.04 16 2006-03-29 01:00:40 2006-05-01 16:07:49
15: 04.07.04 20 2006-03-29 01:00:41 2006-05-01 16:07:50
16: 04.07.02 15 2006-03-29 01:00:41 2006-05-01 16:07:50
17: 04.07.02 19 2006-03-29 01:00:41 2006-05-01 16:07:51
18: 04.07.02 17 2006-03-29 01:00:42 2006-05-01 16:07:51
19: 04.07.02 21 2006-03-29 01:00:42 2006-05-01 16:07:51
20: 04.07.02 18 2006-03-29 01:00:42 2006-06-01 09:45:36
21: 04.07.04 16 2006-05-02 09:47:57 2006-06-01 09:45:25
22: 04.07.04 20 2006-05-02 09:47:57 2006-06-01 09:45:28
23: 04.07.02 15 2006-05-02 09:47:58 2006-06-01 09:45:31
24: 04.07.02 19 2006-05-02 09:47:58 2006-06-01 09:45:32
25: 04.07.02 17 2006-05-02 09:47:58 2006-06-01 09:45:34
26: 04.07.02 21 2006-05-02 09:47:58 2006-06-01 09:45:35
27: 04.07.05 16 2006-06-01 09:45:27 2006-08-14 17:54:15
28: 04.07.05 20 2006-06-01 09:45:29 2006-08-14 17:54:15
29: 04.07.06 15 2006-06-01 09:45:31 2007-12-12 11:03:00
30: 04.07.06 19 2006-06-01 09:45:33 2007-12-12 11:03:00
31: 04.07.03 17 2006-06-01 09:45:35 2006-08-14 17:54:16
32: 04.07.03 21 2006-06-01 09:45:35 2006-08-14 17:54:16
33: 04.07.04 18 2006-06-01 09:45:37 2007-12-12 11:03:00
34: 04.07.06 16 2006-08-14 17:54:15 2007-12-12 11:02:59
35: 04.07.06 20 2006-08-14 17:54:15 2007-12-12 11:02:59
36: 04.07.04 17 2006-08-14 17:54:16 2007-12-12 11:03:00
37: 04.07.04 21 2006-08-14 17:54:16 2007-12-12 11:03:00
38: 04.05.12 14 2011-06-17 15:40:13 2012-05-24 11:43:24
我需要添加了所有的間隔(間第二到最後一個和最後一列),但如您所見,某些行具有重疊或部分重疊的間隔。
之前,我添加了所有的日子裏,我需要完整的數據集(從上面的代碼中來)轉換成類似:
accumulated data:
configStartDate configEndDate
1: 2005-11-03 19:12:36 2007-12-12 11:03:00
2: 2011-06-17 15:40:13 2012-05-24 11:43:24
total days: 934.296
下面是這樣做我的R代碼裏面(它必須是R,雖然我正在考慮重新寫在C++和使用RCPP):
merge_intervals <- function(interval_dt){
interval_dt <- interval_dt[order(configStartDate), list(configStartDate, configEndDate)]
new_dt <- interval_dt[1, list(configStartDate, configEndDate)]
for (i in 2:dim(interval_dt)[1]) {
buff <- interval_dt[i, list(configStartDate, configEndDate)]
if (new_dt[dim(new_dt)[1], configEndDate] >= buff[, configStartDate]){
if(new_dt[dim(new_dt)[1], configEndDate] >= buff[, configEndDate]){
next
}
else{
new_dt[dim(new_dt)[1], configEndDate := buff[, configEndDate]]
}
}
else {
new_dt <- rbind(new_dt, buff)
}
}
return(new_dt)
}
現在整件事花費約0.16秒,(與其他計算)上運行,但是,對於3000個獨特的資產,創建計算時間開銷8分鐘。
如何將for
循環轉換成更快的東西來減少計算時間?謝謝!
應該可以做矢量化。你想如何處理重疊的時間間隔?忽略重疊或將間隔合併成一個新的間隔,只考慮新的間隔? – Thierry
對不起,但您的示例並未向我明確說明您要執行的操作。你如何從你在第一個街區顯示的10個街區(全部在2006年)到第二個街區的兩個街區(跨度爲2005-2012)?你能準確地描述如何從樣本輸入到預期輸出? – josliber
我編輯了樣本以包含所有行以使其更清晰。 –