我改變了一點點的OP的原始數據集下面的教學/教學方面的原因:
df <- data.frame(
X1=c("GBP/USD"),
X2=c("2017-06-01 00:00:00", "2017-06-01 00:00:00", "2017-06-01 00:00:01", "2017-06-01 00:00:01", "2017-06-01 00:00:01", "2017-06-01 00:00:02", "2017-06-30 20:59:52", "2017-06-30 20:59:54", "2017-06-30 20:59:54", "2017-06-30 20:59:56", "2017-06-30 20:59:56", "2017-06-30 20:59:56"),
X3=c(1.28756, 1.28754, 1.28754, 1.28753, 1.28752, 1.28757, 1.30093, 1.30121, 1.30100, 1.30146, 1.30145,1.30145),
X4=c(1.28763, 1.28760, 1.28759, 1.28758, 1.28755, 1.28760,1.30300, 1.30300, 1.30390, 1.30452, 1.30447, 1.30447),
stringsAsFactors=FALSE)
df
X1 X2 X3 X4
1 GBP/USD 2017-06-01 00:00:00 1.28756 1.28763
2 GBP/USD 2017-06-01 00:00:00 1.28754 1.28760
3 GBP/USD 2017-06-01 00:00:01 1.28754 1.28759
4 GBP/USD 2017-06-01 00:00:01 1.28753 1.28758
5 GBP/USD 2017-06-01 00:00:01 1.28752 1.28755
6 GBP/USD 2017-06-01 00:00:02 1.28757 1.28760
7 GBP/USD 2017-06-30 20:59:52 1.30093 1.30300
8 GBP/USD 2017-06-30 20:59:54 1.30121 1.30300
9 GBP/USD 2017-06-30 20:59:54 1.30100 1.30390
10 GBP/USD 2017-06-30 20:59:56 1.30146 1.30452
11 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447
12 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447
現在,在低頻的數據,將有成爲相同事物的分組。所以,我們必須找到對應唯一startings指數,以及各組的結局:
indices <- seq_along(df[,2])[!(duplicated(df[,2]))] # 1 3 6 7 8 10; the beginnings of groups (observations)
indices - 1 # 0 2 5 6 7 9; for finding the endings of groups
numberoflowfreq <- length(indices) # 6: number of groupings (obs.) for Low Freq data
公然寫明白的模式:
mean(df[1:((indices -1)[2]),3]) # from 1 to 2
mean(df[indices[2]:((indices -1)[3]),3]) # from 3 to 5
mean(df[indices[3]:((indices -1)[4]),3]) # from 6 to 6
mean(df[indices[4]:((indices -1)[5]),3]) # from 7 to 7
mean(df[indices[5]:((indices -1)[6]),3]) # from 8 to 9
mean(df[indices[6]:nrow(df),3]) # from 10 to 12
簡化模式:
mean3rdColumn_1st <- mean(df[1:((indices -1)[2]),3]) # from 1 to 2
mean3rdColumn_Between <- sapply(2:(numberoflowfreq-1), function(i) mean(df[indices[i]:((indices -1)[i+1]),3]))
mean3rdColumn_Last <- mean(df[indices[6]:nrow(df),3]) # from 10 to 12
# 3rd column in low frequency data:
c(mean3rdColumn_1st, mean3rdColumn_Between, mean3rdColumn_Last)
同樣對於第4列:
mean4thColumn_1st <- mean(df[1:((indices -1)[2]),4]) # from 1 to 2
mean4thColumn_Between <- sapply(2:(numberoflowfreq-1), function(i) mean(df[indices[i]:((indices -1)[i+1]),4]))
mean4thColumn_Last <- mean(df[indices[6]:nrow(df),4]) # from 10 to 12
# 4th column in low frequency data:
c(mean4thColumn_1st, mean4thColumn_Between, mean4thColumn_Last)
收集所有的努力:現在
LowFrqData <- data.frame(X1=c("GBP/USD"), X2=df[indices,2], X3=c(mean3rdColumn_1st, mean3rdColumn_Between, mean3rdColumn_Last), x4=c(mean4thColumn_1st, mean4thColumn_Between, mean4thColumn_Last), stringsAsFactors=FALSE)
LowFrqData
X1 X2 X3 x4
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487
,列X2
具有獨特的分鐘值,X3
和X4
被相關細胞的形成。
另請注意:某個範圍內的所有分鐘數可能不會有值。對於這種情況,您可以抽取NA
。另一方面,在這種情況下,人們可能會忽略不規則的影響,因爲觀察的間隔對於許多觀察來說可能是相同的,因此不是非常不規則。還要考慮使用線性內插將數據轉換爲等距觀測的事實可以引入一些重要且難以量化的偏差(參見:Scholes和Williams)。
M. Scholes and J. Williams, 「Estimating betas from nonsynchronous data」, Journal of Financial Economics 5: 309–327, 1977.
現在,經常5分鐘系列部分:
as.numeric(as.POSIXct("1970-01-01 03:00:00")) # 0; starting point for ZERO seconds. "1970-01-01 03:01:00" equals 60.
as.numeric(as.POSIXct("2017-06-01 00:00:00")) # 1496264400
# Passed seconds after the first observation in the dataset
PassedSecs <- as.numeric(as.POSIXct(LowFrqData$X2)) - 1496264400
LowFrq5minuteRaw <- cbind(LowFrqData, PassedSecs, stringsAsFactors=FALSE)
LowFrq5minuteRaw
X1 X2 X3 x4 PassedSecs
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615 0
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573 1
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600 2
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 2581192
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 2581194
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 2581196
5分鐘裝置5 * 60 = 300秒。因此,「在300分鐘內具有相同的商數」以5分鐘爲間隔對觀測結果進行分組。
LowFrq5minuteRaw2 <- cbind(LowFrqData, PassedSecs, QbyDto300 = PassedSecs%/%300, stringsAsFactors=FALSE)
LowFrq5minuteRaw2
X1 X2 X3 x4 PassedSecs QbyDto300
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615 0 0
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573 1 0
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600 2 0
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 2581192 8603
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 2581194 8603
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 2581196 8603
indices2 <- seq_along(LowFrq5minuteRaw2[,6])[!(duplicated(LowFrq5minuteRaw2[,6]))] # 1 4; the beginnings of groups
LowFrq5minute <- data.frame(X1=c("GBP/USD"), X2=LowFrq5minuteRaw2[indices2,2], X3=aggregate(LowFrqData[,3] ~ QbyDto300, LowFrq5minuteRaw2, mean)[,2], X4=aggregate(LowFrqData[,4] ~ QbyDto300, LowFrq5minuteRaw2, mean)[,2])
LowFrq5minute
X1 X2 X3 X4
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287596
2 GBP/USD 2017-06-30 20:59:52 1.301163 1.303646
X2
持有5分鐘OBS的趴在區間的代表第一次出現次數的時間戳。
如果包含'head(yourdata)','tail(yourdata)',這將是有用的。此外,imgur.com不起作用。您可以使用任何其他存儲。 –
謝謝,請找到頭(尾巴沒有足夠的空間)。這些數據已經從.csv直接導入filefile#A tibble:6×4 X1 X2 X3 X4 1 GBP/USD 2017年6月1日00:00:00 1.28756 1.28763 2英鎊/美元2017年6月1日00:00:00 1.28754 1.28760 3英鎊/美元2017年6月1日00:00:00 1.28754 1.28759 4英鎊/美元2017年6月1日00:00:00 1.28753 1.28759 5英鎊/美元2017年6月1日00:00:00 1.28753 1.28759 6英鎊/美元2017年6月1日00:00:00 1.28753 1.28759 –
hx1
據此編輯你的問題;不在評論部分,而是在原始問題部分。此外,請使用「{}」代碼符號清楚地顯示您的數據。的[四捨五入時間最近的一刻鐘] –