2011-03-18 69 views
1

我試圖合併兩個不同的時間序列R中具有以下特點:R個時間序列,複雜的序列

  1. 數據必須是08:30和15:00之間每天的基礎上。
  2. 數據跨越數週,而不僅僅是某一天。
  3. 數據在隨機間隔中存在間隙。
  4. 兩個數據集將不能有縫隙在相同的時間間隔一定

我想從08:30合併這兩個數據集,與序列中所有時間至15:00和那裏有一個缺口在每一個,我想先前的價值(或下面的價值)結轉。

# I have verified that the csv files are imported correctly 
# The first column contains dates. and the strptime 
# function can convert strings into Date/Time objects. 
# 
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S") 
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S") 

# The second column contains the close. 
# I use the zoo function to create zoo objects from that data. 
# But for some reason this ends up creating duplicates PROBLEM 1 
# 
a <- zoo(sec1[,2], sec1_dates) 
b <- zoo(sec2[,2], sec2_dates) 

# I know that I need use seq to fill in gaps but I am clueless as to how 
# Once I have the proper seq I can just use na.locf to fill the appropriate values 
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning 
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2 

# The merge function can combine two zoo objects, in union 
# Obviously this fails because the two index sizes don't match PROBLEM 3 
# 
t.zoo <- merge(a, b, all=TRUE) 

詹姆斯,你是對的問題1.謝謝。我證實csv文件是兩次拉數據並刪除數據解決了問題。我也使用瞭解決問題2的解決方案,但我不確定這是執行我想要做的最有效的方法。最終,我可能希望使用它來運行迴歸,並且在那一點上可能需要某種類型的迴路來提取任意數量的數據集。我可能會做的任何優化都將不勝感激。

更新的解決方案

library(zoo) 
library(tseries) 

# Read the CSV files into data frames 
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F) 
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F) 

# The first column contains dates. 
# I use strptime to tell it what format these appear in. 
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S") 
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S") 

# The second column contains the close prices for the securities. 
# I use the zoo function to create zoo objects from that data. 
# Input = a vector of data and a vector of dates. 
a <- zoo(sec1[,2], sec1_dates) 
b <- zoo(sec2[,2], sec2_dates) 

# create a discrete time-series with the exact time frame desired 
# per tip from James 
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min")) 
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")] 

# The merge function is then used to merge 
# 1) each security to the template (uses the discrete date/time range) 
# 2) remove the column of data from template (used only for dates) 
# 3) each security to one another (this was the ultimate goal anyway. 
a.zoo <- merge(a, template, all=TRUE) 
a.zoo$template <- NULL 
b.zoo <- merge(b, template, all=TRUE) 
b.zoo$template <- NULL 
t.zoo <- merge(a.zoo, b.zoo, all=TRUE) 

# Fill all NA elements with the closest non NA value. 
t <- na.locf(t.zoo) 
+1

-1請通過提供樣品數據說明問題。使用'dput'來做到這一點。顯示你得到的以及它與你想要的不同。 「顯然失敗」根本不明顯。 'merge.zoo'不需要匹配索引。 – 2011-03-18 11:35:32

回答

1

問題1

?zoo對如何處理重複的細節,但是這可能是因爲你必須strptime創建您的日期重複。

問題2

可以使用[whichtime子集次zoo對象,見?zoo,例如:

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")] 

問題3

使用c結合:t.zoo <- c(a,b)

+0

詹姆斯,非常感謝你的幫助!我用你的解決方案來解決問題1和2,並且我相當肯定用c來代替最終的合併會提高我的代碼處理的速度。是否有任何額外的修改建議? (注意:我更新了上面的代碼) – saneshark 2011-03-18 22:42:00