2014-10-03 46 views
4

我有兩個.csv文件,其中包含底部給出的兩個單獨的時間序列。我可以導入這些作爲dataframes爲R:R - 使用不同頻率對齊時間序列

data1 <- read.csv(data1.csv) 
data2 <- read.csv(data2.csv) 

我在每個數據幀datetimeprice信息。 我想要將價格從data1data2以及共同頻率爲10秒的價格統一在一張表格中。

我有開始和結束的日期和時間這兩個時間序列的頻率(每比方說,每天觀察,因此數)是不同的,每一天的開始和結束時間也diferent

我厭倦了使用ts(),但我不認爲這個功能可以同時使用日期和時間。

將這些時間序列與共同頻率對齊的最有效方法是什麼?

data1.csv:

date,time,price 
01/06/2014,05:59:42,1954.75 
01/06/2014,06:00:05,1954.875 
01/06/2014,06:00:06,1954.75 
01/06/2014,06:00:08,1954.875 
01/06/2014,06:02:05,1954.625 
01/06/2014,06:02:22,1954.875 
01/06/2014,06:03:12,1954.75 
01/06/2014,06:03:14,1954.625 
01/06/2014,06:03:20,1954.75 
01/06/2014,06:03:22,1954.875 
01/06/2014,06:03:23,1954.75 
01/06/2014,06:03:26,1954.875 
01/06/2014,06:07:07,1955.125 
01/06/2014,06:07:21,1954.875 
01/06/2014,06:08:54,1954.625 
01/06/2014,06:16:55,1954.375 
01/06/2014,06:17:00,1954.625 
01/06/2014,06:21:46,1954.875 
01/06/2014,06:28:11,1955.125 
01/06/2014,06:30:23,1955.375 
01/06/2014,06:30:49,1955.125 
01/06/2014,06:33:33,1955.375 
01/06/2014,06:34:30,1955.125 
01/06/2014,06:37:39,1955.375 
01/06/2014,06:37:43,1955.125 
01/06/2014,06:47:42,1954.875 
01/06/2014,06:50:23,1955.125 
01/06/2014,06:57:10,1954.875 
01/06/2014,06:57:12,1955.125 
01/06/2014,07:00:08,1954.875 
01/06/2014,07:00:21,1955.125 
01/06/2014,07:00:55,1955.375 
01/06/2014,07:01:19,1955.125 
01/06/2014,07:01:51,1955.375 
02/06/2014,05:59:50,1966.625 
02/06/2014,06:00:00,1966.375 
02/06/2014,06:00:07,1966.5 
02/06/2014,06:00:08,1966.625 
02/06/2014,06:00:10,1966.375 
02/06/2014,06:00:33,1966.125 
02/06/2014,06:00:34,1966.375 
02/06/2014,06:00:41,1966.125 
02/06/2014,06:00:48,1966.375 
02/06/2014,06:02:48,1966.625 
02/06/2014,06:03:24,1966.875 
02/06/2014,06:04:23,1967.125 
02/06/2014,06:04:39,1966.875 
02/06/2014,06:05:28,1966.625 
02/06/2014,06:06:25,1966.375 
02/06/2014,06:07:44,1966.625 

data2.csv:

date,time,price 
01/06/2014,02:05:25,0 
01/06/2014,06:00:07,3231.5 
01/06/2014,06:00:17,3232.5 
01/06/2014,06:00:19,3231.5 
01/06/2014,06:00:33,3232.5 
01/06/2014,06:00:40,3231.5 
01/06/2014,06:00:41,3232.5 
01/06/2014,06:00:42,3231.5 
01/06/2014,06:00:44,3232.5 
01/06/2014,06:04:06,3233.5 
01/06/2014,06:04:22,3232.5 
01/06/2014,06:04:42,3233.5 
01/06/2014,06:08:48,3232.5 
01/06/2014,06:10:12,3231.5 
01/06/2014,06:10:35,3232.5 
01/06/2014,06:21:45,3233.5 
01/06/2014,06:21:55,3234.5 
01/06/2014,06:29:00,3235.5 
01/06/2014,06:33:34,3236.5 
01/06/2014,06:34:30,3235.5 
01/06/2014,06:41:33,3234.5 
01/06/2014,06:47:42,3233.5 
01/06/2014,06:48:33,3234.5 
01/06/2014,06:50:23,3235.5 
01/06/2014,06:52:04,3236.5 
01/06/2014,06:57:11,3235.5 
01/06/2014,07:00:00,3236.5 
01/06/2014,07:00:06,3235.5 
01/06/2014,07:00:08,3233.5 
01/06/2014,07:00:09,3234.5 
01/06/2014,07:00:10,3233.5 
01/06/2014,07:00:11,3234.5 
01/06/2014,07:00:21,3235.5 
02/06/2014,06:00:10,3252.5 
02/06/2014,06:00:20,3252 
02/06/2014,06:00:21,3251.5 
02/06/2014,06:00:33,3250.5 
02/06/2014,06:00:34,3251 
02/06/2014,06:00:35,3250.5 
02/06/2014,06:00:41,3249.5 
02/06/2014,06:01:31,3250.5 
02/06/2014,06:01:32,3249.5 
02/06/2014,06:01:38,3250.5 
02/06/2014,06:02:47,3251.5 
02/06/2014,06:05:32,3250.5 
02/06/2014,06:06:25,3249.5 
02/06/2014,06:07:44,3250.5 
02/06/2014,06:08:11,3249.5 
02/06/2014,06:12:32,3250.5 
02/06/2014,06:16:56,3251.5 
02/06/2014,06:17:08,3250.5 
02/06/2014,06:18:32,3251.5 
02/06/2014,06:31:59,3250.5 
02/06/2014,06:32:11,3251.5 
02/06/2014,06:44:47,3250.5 
02/06/2014,06:45:09,3251.5 
02/06/2014,06:52:33,3252.5 
02/06/2014,06:52:36,3253.5 
02/06/2014,06:55:30,3254.5 
02/06/2014,06:55:39,3253.5 
02/06/2014,06:57:27,3254.5 
02/06/2014,07:00:01,3253.5 
02/06/2014,07:00:02,3254.5 
02/06/2014,07:00:17,3253.5 
02/06/2014,07:00:23,3252.5 

這是數據幀 '數據1' 的樣子:

date  time    Price 
1 2014-06-01 06:03:59.614000  62.1250 
2 2014-06-01 06:03:59.692000  62.2500 
3 2014-06-01 06:15:42.004000  62.2375 
4 2014-06-01 06:15:42.083000  61.9250 
5 2014-06-01 06:17:01.654000  61.9125 
6 2014-06-01 06:17:01.732000  61.9000 
7 2014-06-01 06:23:41.908000  61.8200 
8 2014-06-01 06:23:41.986000  61.8570 
9 2014-06-01 06:23:55.211000  61.9065 
10 2014-06-01 06:23:55.291000  61.8725 
11 2014-06-01 06:24:11.679000  61.8715 

回答

3

一個例子數據集

date_time <- seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), as.POSIXlt("2014-01-07 07:00:00"), by = "1 secs") 
date_time_1 <- sample(date_time, 100) 
date_time_2 <- sample(date_time, 100) 

data1 <- data.frame(date=as.Date(date_time_2), 
      time = format(date_time_1, "%H:%M:%S"), 
      price = rnorm(100) 
) 
# format the date and time 
data1$datetime <- strptime(paste(data1$date, data1$time), "%Y-%m-%d %H:%M:%S") 

data2 <- data.frame(date=as.Date(date_time_2), 
        time = format(date_time_1, "%H:%M:%S"), 
        price = rnorm(100) 
) 
# format the date and time 
data2$datetime <- strptime(paste(data2$date, data2$time), "%Y-%m-%d %H:%M:%S") 

下一節回答您的問題

## Round off the times to 10 second increments 
data1$datetime <- data1$datetime - as.numeric(format(data1$datetime, "%S"))%%10 
data2$datetime <- data2$datetime - as.numeric(format(data2$datetime, "%S"))%%10 

## Aggregate the data in case there are multiple observations in one 10 second block 
data1_freq <- aggregate(data1$price, list(date=as.POSIXct(data1$datetime)), mean) 
data2_freq <- aggregate(data2$price, list(date=as.POSIXct(data2$datetime)), mean) 

### Now merge the two data sets - not dropping any observations 
data <- merge(data2_freq, data1_freq, by="date", all = TRUE) 

和可選,你可以將它合併到一個完整的時間序列

## create a continuous date based on the desired freq (here 10 seconds) 
cont_date_time <- data.frame(date = 
           seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), 
              as.POSIXlt("2014-01-07 07:00:00"), 
              by = "10 secs") 
) 

# And merge the data with the complete time series 
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE) 

要限制持續時間序列工作日和工時

## create a continuous date based on the desired freq (here 10 seconds) 
cont_date_time <- data.frame(date = 
           seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), 
              as.POSIXlt("2014-01-07 07:00:00"), 
              by = "10 secs") 
) 
# Use the lubridate package to subset the date sequence 

library(lubridate) 
## Use the wday function to see what day of the week it is (i.e. Monday - Friday) 
cont_date_time <- cont_date_time[with(cont_date_time, wday(date)>=2&wday(date)<=6) ,] 
## Use the hour function to see if it is within working hours 
cont_date_time <- cont_date_time[with(cont_date_time, hour(date)>=9&hour(date)<=4) ,] 

# And merge the data with the complete time series 
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE) 
+0

謝謝Jase_,我編輯的問題,而你回答。我改變了我的data.csv例子來放置不同的日期(請參閱問題)。我可以在這種方法中考慮不同的日期和時間嗎? – Rhubarb 2014-10-03 14:59:47

+1

是的,在合併日期和時間來創建POSIXt格式並將日期和時間作爲一件事情時,這應該不成問題。我已經更新了我的回答以顯示。 – 2014-10-03 15:01:21

+0

再次感謝您,我會接受您的回答。最後一件事情是:我確實需要你在底部給出的連續解決方案(thx),但是隻有在市場營銷時間內(例如,例如:'06.00.00 - 06.30.00 AM')每天。我是否會在開始和結束時間之間創建連續數據(如您所做的那樣),然後過濾'如果日期時間<06.00和日期時間> 06.30'?我怎樣才能做到這一點? – Rhubarb 2014-10-06 08:45:29

2

這個如果你使用時間序列表示法是最簡單的。在這裏,我們將數據讀入動物園對象。 index = 1:2告訴它,前兩列包含索引,FUN=f指定一個轉換函數,它將數據轉換爲"chron"類並截斷爲10分鐘,而agg=mean指定用於聚合數據的函數。然後我們就可以合併動物園對象:

library(zoo) 
library(chron) 

f <- function(d, t) trunc(chron(as.character(d), as.character(t)), "00:10:00") 

z1 <- read.zoo("data.csv", header=TRUE, sep=",", index=1:2, FUN=f, agg=mean) 
z2 <- read.zoo("data2.csv", header=TRUE, sep=",", index=1:2, FUN=f, agg=mean) 

merge(z1, z2) 

這給:

      z1  z2 
(01/06/14 02:00:00)  NA 0.000 
(01/06/14 05:50:00) 1954.750  NA 
(01/06/14 06:00:00) 1954.804 3232.333 
(01/06/14 06:10:00) 1954.500 3232.000 
(01/06/14 06:20:00) 1955.000 3234.500 
(01/06/14 06:30:00) 1955.250 3236.000 
(01/06/14 06:40:00) 1954.875 3234.167 
(01/06/14 06:50:00) 1955.042 3235.833 
(01/06/14 07:00:00) 1955.175 3234.786 
(02/06/14 05:50:00) 1966.625  NA 
(02/06/14 06:00:00) 1966.533 3250.633 
(02/06/14 06:10:00)  NA 3251.000 
(02/06/14 06:30:00)  NA 3251.000 
(02/06/14 06:40:00)  NA 3251.000 
(02/06/14 06:50:00)  NA 3253.700 
(02/06/14 07:00:00)  NA 3253.500 
+0

我嘗試從數據幀讀取而不是.csv 。我添加了'data1'數據框如何看起來像我的問題。我嘗試:'read.zoo(data1,index = 1:2,FUN = f,agg = mean)'但是得到一個'm/d/y格式不正確的錯誤。我究竟做錯了什麼?我需要在'f'中指定格式嗎? – Rhubarb 2014-10-03 16:32:25

+0

這取決於你有什麼。如果'DF'是你的數據幀'dput(DF)'的輸出是什麼? – 2014-10-03 20:06:44

+0

'dput(data1)'的輸出長度爲幾頁。但是'sapply(data1,class)'給出:'date =「factor」time =「factor」price =「numeric」'這是我錯誤的地方嗎?我在我的問題的底部列出了'data1'的輸出。 – Rhubarb 2014-10-06 07:18:28