2017-04-12 80 views
1

我的數據集如何計算條件的持續時間(分鐘)?

我的數據集包括很多人(ID)在本週(Day)的各種天不同的區域(Location)工作的開始和結束時間。我的數據集的下面是一個例子:

> head(WeekOne, 15) 
       Start    Finish Day  ID Location 
1 2017-04-12 00:00:00 2017-04-12 00:02:55 D1 Daniel Office 
2 2017-04-12 00:02:55 2017-04-12 00:06:18 D1 Daniel Office 
3 2017-04-12 00:06:18 2017-04-12 00:08:20 D1 Daniel OnSite 
4 2017-04-12 00:08:20 2017-04-12 00:08:40 D1 Daniel OnSite 
5 2017-04-12 00:08:40 2017-04-12 00:10:11 D1 Daniel Travel 
6 2017-04-12 00:10:11 2017-04-12 00:10:18 D1 Daniel Travel 
7 2017-04-12 00:10:18 2017-04-12 00:17:52 D1 Daniel Travel 
8 2017-04-12 00:17:52 2017-04-12 00:19:00 D1 Daniel Travel 
9 2017-04-12 00:19:00 2017-04-12 00:19:56 D1 Daniel OnSite 
10 2017-04-12 00:19:56 2017-04-12 00:28:48 D1 Daniel OnSite 
11 2017-04-12 00:00:00 2017-04-12 00:03:52 D2 Daniel OnSite 
12 2017-04-12 00:03:52 2017-04-12 00:04:05 D2 Daniel Office 
13 2017-04-12 00:04:05 2017-04-12 00:08:32 D2 Daniel Office 
14 2017-04-12 00:08:32 2017-04-12 00:16:01 D2 Daniel Travel 
15 2017-04-12 00:16:01 2017-04-12 00:25:35 D2 Daniel OnSite 

我想知道的總時間,以分鐘爲單位,每個ID在一週花費在每個LocationDay的最大級別是D7,每個星期我都有一個獨立的data.frame。因此,我只需要遍歷LocationID

我有什麼企圖

下面的代碼,雖然這將返回分鐘在一個陌生的格式,並沒有考慮多次訪問同一位置上一天。例如,Daniel在D1上訪問OnSite兩次。

WeekOne %>% 
    group_by(ID, Location) %>% 
    summarise(Duration = max(Finish) - min(Start)) 

我沒想到創建佔多和變化Location新列WeekOne$Level的。然後我可以迭代每個Level並使用上面的代碼。例如:

> head(WeekOne, 15) 
       Start    Finish Day  ID Location Level 
1 2017-04-12 00:00:00 2017-04-12 00:02:55 D1 Daniel Office 1 
2 2017-04-12 00:02:55 2017-04-12 00:06:18 D1 Daniel Office 1 
3 2017-04-12 00:06:18 2017-04-12 00:08:20 D1 Daniel OnSite 2 
4 2017-04-12 00:08:20 2017-04-12 00:08:40 D1 Daniel OnSite 2 
5 2017-04-12 00:08:40 2017-04-12 00:10:11 D1 Daniel Travel 3 
6 2017-04-12 00:10:11 2017-04-12 00:10:18 D1 Daniel Travel 3 
7 2017-04-12 00:10:18 2017-04-12 00:17:52 D1 Daniel Travel 3 
8 2017-04-12 00:17:52 2017-04-12 00:19:00 D1 Daniel Travel 3 
9 2017-04-12 00:19:00 2017-04-12 00:19:56 D1 Daniel OnSite 4 
10 2017-04-12 00:19:56 2017-04-12 00:28:48 D1 Daniel OnSite 4 
11 2017-04-12 00:00:00 2017-04-12 00:03:52 D2 Daniel OnSite 5 
12 2017-04-12 00:03:52 2017-04-12 00:04:05 D2 Daniel Office 6 
13 2017-04-12 00:04:05 2017-04-12 00:08:32 D2 Daniel Office 6 
14 2017-04-12 00:08:32 2017-04-12 00:16:01 D2 Daniel Travel 7 
15 2017-04-12 00:16:01 2017-04-12 00:25:35 D2 Daniel OnSite 8 

WeekOne %>% 
    group_by(ID, Level) %>% 
    summarise(Duration = max(Finish) - min(Start)) 

不過,我不確定如何,即使在添加此列,它不佔Location,看起來繁瑣,不分鐘,一個有趣的格式返回解決這個問題。

我的問題

我怎麼能快速,輕鬆地計算出Location每個ID隨時間的總時長?我希望持續時間在幾分鐘內,四捨五入到最接近的分鐘。例如:3分鐘。

回答

1

你首先要計算時間,然後通過ID和位置得到的總和:

WeekOne %>% 
     mutate(Duration = Finish - Start) %>% 
     group_by(ID, Location) %>% 
     summarize(Total_Duration = round(sum(Duration)/60, 1)) 
+0

這是什麼'Total_Duration'的格式?例如,我給出了一個數字933.50000238419,但是如何在幾分鐘內獲得'Total_Duration'? – user2716568

+0

看起來你在秒atm,所以只需除以60分鐘即可獲得分鐘數 –