2016-01-22 69 views
0

我正在尋找一種有效的方法來計算給定日期的時間差。我的數據如下所示:R中相鄰時間戳之間的時間差

Time_formatted Date Hours_diff 
    1/20/2016 19:19 1/20/2016 0:46 
    1/20/2016 18:33 1/20/2016 2:43 
    1/20/2016 15:50 1/20/2016 1:28 
    1/20/2016 14:22 1/20/2016 1:50 
    1/20/2016 12:32 1/20/2016 4:52 
    1/20/2016 7:40 1/20/2016 0 
    1/19/2016 23:23 1/19/2016 
    1/19/2016 23:06 1/19/2016 
    1/19/2016 22:37 1/19/2016 
    1/19/2016 21:56 1/19/2016 
    1/19/2016 21:05 1/19/2016 
    1/19/2016 17:53 1/19/2016 
    1/19/2016 17:39 1/19/2016 
    1/19/2016 17:01 1/19/2016 
    1/19/2016 15:31 1/19/2016 

我首先想到的是使用ave功能R:

ave(data$Time_formatted, data$Date, FUN=difftime) 

,但一個明顯的錯誤,會出現使用此功能Error in as.POSIXct(time2) : argument "time2" is missing, with no default。由於數據非常大,循環效率低下。 任何想法來解決這類問題?

+1

我不知道,我知道你想要什麼。你想爲每個日期在相鄰時間戳之間的所有時間差異?或者每天只有一個號碼? – Stibu

+0

嗨,你在第一個假設中是正確的,我編輯過這個帖子來顯示一個硬編碼的例子。最大的訣竅是要處理一天中沒有時間差異的第一時間...... –

回答

1

首先,我準備數據。我要確保日期和時間由列Time_formatted存儲在正確的格式和排序的數據幀:

# convert times to POSIXct, dates to Date 
data$Time_formatted <- as.POSIXct(data$Time_formatted, format = "%m/%d/%Y %H:%M") 
data$Date <- as.Date(data$Date, format = "%m/%d/%Y") 
# sort 
data <- data[order(data$Time_formatted), ] 

然後我結合使用tapply()diff()來計算以分鐘爲單位的差異。請注意,我以佔首次每一天,其中的時間差不確定增加一個額外的零:

my_diff <- function(x, ...) { 
    c(0, diff(x, ...)) 
} 
diffs <- unlist(tapply(data$Time_formatted, data$Date, my_diff)) 

的最後一步是從分鐘的時間差轉換爲%H:%M如下(見this answer的使用formatC()):

mins2hm <- function(min) { 
    h <- min %/% 60 
    m <- min %% 60 
    hm <- paste(h, formatC(m, width = 2, flag = 0), sep = ":") 
} 
data$diffs <- mins2hm(diffs) 
data 
##   Time_formatted  Date diffs 
## 15 2016-01-19 15:31:00 2016-01-19 0:00 
## 14 2016-01-19 17:01:00 2016-01-19 1:30 
## 13 2016-01-19 17:39:00 2016-01-19 0:38 
## 12 2016-01-19 17:53:00 2016-01-19 0:14 
## 11 2016-01-19 21:05:00 2016-01-19 3:12 
## 10 2016-01-19 21:56:00 2016-01-19 0:51 
## 9 2016-01-19 22:37:00 2016-01-19 0:41 
## 8 2016-01-19 23:06:00 2016-01-19 0:29 
## 7 2016-01-19 23:23:00 2016-01-19 0:17 
## 6 2016-01-20 07:40:00 2016-01-20 0:00 
## 5 2016-01-20 12:32:00 2016-01-20 4:52 
## 4 2016-01-20 14:22:00 2016-01-20 1:50 
## 3 2016-01-20 15:50:00 2016-01-20 1:28 
## 2 2016-01-20 18:33:00 2016-01-20 2:43 
## 1 2016-01-20 19:19:00 2016-01-20 0:46 
+0

謝謝,我喜歡在這個例子中集成'tapply'! –

1

步驟和輸出如下:

> df<-read.csv("data.txt",header=T,stringsAsFactors=F) 
> df$Time_formatted<-strptime(a$Time_formatted,"%m/%d/%Y %H:%M") 
> df$Date   <-strptime(a$Date,"%m/%d/%Y") 
> df<-df[order(df$Time_formatted,decreasing=T),] #Make sure it is ordered 
> df 

     Time_formatted  Date 
1 2016-01-20 19:19:00 2016-01-20 
2 2016-01-20 18:33:00 2016-01-20 
3 2016-01-20 15:50:00 2016-01-20 
4 2016-01-20 14:22:00 2016-01-20 
5 2016-01-20 12:32:00 2016-01-20 
6 2016-01-20 07:40:00 2016-01-20 
7 2016-01-19 23:23:00 2016-01-19 
8 2016-01-19 23:06:00 2016-01-19 
9 2016-01-19 22:37:00 2016-01-19 
10 2016-01-19 21:56:00 2016-01-19 
11 2016-01-19 21:05:00 2016-01-19 
12 2016-01-19 17:53:00 2016-01-19 
13 2016-01-19 17:39:00 2016-01-19 
14 2016-01-19 17:01:00 2016-01-19 
15 2016-01-19 15:31:00 2016-01-19 


> df$Hours_diff<-c(-diff(df$Time_formatted),0) # calculate time difference 
> df[which(diff(df$Date)!=0),"Hours_diff"]<-0 # set the last timepoint in day to 0 
> df$Hours_diff<-ifelse(df$Hours_diff>0,paste(floor(df$Hours_diff/60),df$Hours_diff%%60,sep=":"),0) 
> df 

     Time_formatted  Date Hours_diff 
1 2016-01-20 19:19:00 2016-01-20  0:46 
2 2016-01-20 18:33:00 2016-01-20  2:43 
3 2016-01-20 15:50:00 2016-01-20  1:28 
4 2016-01-20 14:22:00 2016-01-20  1:50 
5 2016-01-20 12:32:00 2016-01-20  4:52 
6 2016-01-20 07:40:00 2016-01-20   0 
7 2016-01-19 23:23:00 2016-01-19  0:17 
8 2016-01-19 23:06:00 2016-01-19  0:29 
9 2016-01-19 22:37:00 2016-01-19  0:41 
10 2016-01-19 21:56:00 2016-01-19  0:51 
11 2016-01-19 21:05:00 2016-01-19  3:12 
12 2016-01-19 17:53:00 2016-01-19  0:14 
13 2016-01-19 17:39:00 2016-01-19  0:38 
14 2016-01-19 17:01:00 2016-01-19  1:30 
15 2016-01-19 15:31:00 2016-01-19   0