2016-07-28 27 views
0

我有this file,它包含三個.csv文件,EURAUD_201501,EURAUD_201502,EURAUD_201503。這些文件包含2015年1月至3月的外匯交易數據。我的練習的第一步是將數據集操作爲功能表單。的(工作)代碼:在R上創建一個有效循環

#Entering and editing data by hand# 

data1<-read.csv("EURAUD_201501.csv", header = FALSE, col.names = c("TIMESTAMP", "BID", "OFR", "VOL"), stringsAsFactors = FALSE) 
data1$VOL<- NULL #drops the VOL column 
data1$TIMESTAMP = sub('(?<=.{11})', ':', data1$TIMESTAMP, perl=TRUE) #manipulate the strings to create clear timestamps 
data1$TIMESTAMP = sub('(?<=.{14})', ':', data1$TIMESTAMP, perl=TRUE) 
data1$TIMESTAMP = sub('(?<=.{17})', '.', data1$TIMESTAMP, perl=TRUE) 
xts_data1 = xts(data1[,c(2,3)], order.by = as.POSIXct(data1$TIMESTAMP, tz = "EST", format = "%Y%m%d %H:%M:%OS")) #Convert file to an xts object 
rm(data1) #remove data1 object in order to save space 

data2<-read.csv("EURAUD_201502.csv", header = FALSE, col.names = c("TIMESTAMP", "BID", "OFR", "VOL"), stringsAsFactors = FALSE) 
data2$VOL<- NULL #drops the VOL column 
data2$TIMESTAMP = sub('(?<=.{11})', ':', data2$TIMESTAMP, perl=TRUE) #manipulate the strings to create clear timestamps 
data2$TIMESTAMP = sub('(?<=.{14})', ':', data2$TIMESTAMP, perl=TRUE) 
data2$TIMESTAMP = sub('(?<=.{17})', '.', data2$TIMESTAMP, perl=TRUE) 
xts_data2 = xts(data2[,c(2,3)], order.by = as.POSIXct(data2$TIMESTAMP, tz = "EST", format = "%Y%m%d %H:%M:%OS")) #Convert file to an xts object 
rm(data2) #remove data2 object in order to save space 

data3<-read.csv("EURAUD_201503.csv", header = FALSE, col.names = c("TIMESTAMP", "BID", "OFR", "VOL"), stringsAsFactors = FALSE) 
data3$VOL<- NULL #drops the VOL column 
data3$TIMESTAMP = sub('(?<=.{11})', ':', data3$TIMESTAMP, perl=TRUE) #manipulate the strings to create clear timestamps 
data3$TIMESTAMP = sub('(?<=.{14})', ':', data3$TIMESTAMP, perl=TRUE) 
data3$TIMESTAMP = sub('(?<=.{17})', '.', data3$TIMESTAMP, perl=TRUE) 
xts_data3 = xts(data3[,c(2,3)], order.by = as.POSIXct(data3$TIMESTAMP, tz = "EST", format = "%Y%m%d %H:%M:%OS")) #Convert file to an xts object 
rm(data3) #remove data3 object in order to save space 

#Create 5-minute intervals 
final_xts = rbind.xts(xts_data1, xts_data2, xts_data3) 
rm(data1_xts, data2_xts, data3_xts) 
final_fivemin = aggregatets(final_xts, FUN = "previoustick", on = "minutes", k = 5) 

如何創建,而不必重複同樣的步驟,對每個數據集的功能循環?

回答

2

似乎你可能想嘗試lapply。你可以用

xts_data <- lapply(real_data, function(x){ 
    data <- read.csv(x, header = FALSE, col.names = c("TIMESTAMP", "BID", "OFR", "VOL"), 
     stringsAsFactors = FALSE) 
    data$VOL<- NULL #drops the VOL column 
    data$TIMESTAMP = sub('(?<=.{11})', ':', data[i]$TIMESTAMP, perl=TRUE) #manipulate the strings to create clear timestamps 
    data$TIMESTAMP = sub('(?<=.{14})', ':', data[i]$TIMESTAMP, perl=TRUE) 
    data$TIMESTAMP = sub('(?<=.{17})', '.', data[i]$TIMESTAMP, perl=TRUE) 
    return(xts(data[,c(2,3)], 
     order.by = as.POSIXct(data$TIMESTAMP, tz = "EST", format = "%Y%m%d %H:%M:%OS"))) 
     #Convert file to an xts object 
}) 

更換for循環,然後完成了:

#Create 5-minute intervals 
final_xts = do.call(rbind, xts_data) 
final_fivemin = aggregatets(final_xts, FUN = "previoustick", on = "minutes", k = 5) 
+0

謝謝您的回答。我認爲你的方向是正確的,但我們需要多一點努力才能取得理想的結果。如果你運行我的第一個功能代碼塊,它會創建矩陣[data_frames是我認爲合適的表達式]。我運行你的代碼,但它不會返回我不幸的東西。我會編輯我最初的問題,使其更清楚。 – Greconomist

+0

我沒有你的數據,所以我不能運行你的代碼。但'lapply'返回一個列表;爲了得到一個數據框,你可以在上面的代碼塊之後執行'final_xts < - do.call(rbind,xts_data)'。 –

+0

如果您點擊我帖子頂部的'我有**此文件**',則可以下載數據。 – Greconomist