5

我與當前存儲以1分鐘爲間隔的一些數據,看起來像這樣的工作:使用GROUP BY與FIRST_VALUE和LAST_VALUE

CREATE TABLE #MinuteData 
    (
     [Id] INT , 
     [MinuteBar] DATETIME , 
     [Open] NUMERIC(12, 6) , 
     [High] NUMERIC(12, 6) , 
     [Low] NUMERIC(12, 6) , 
     [Close] NUMERIC(12, 6) 
    ); 

INSERT INTO #MinuteData 
     ([Id], [MinuteBar], [Open], [High], [Low], [Close]) 
VALUES (1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880), 
     (2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880), 
     (3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040), 
     (4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050), 
     (5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030), 
     (6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950), 
     (7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990), 
     (8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170), 
     (9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120), 
     (10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970); 

SELECT * 
FROM #MinuteData; 

DROP TABLE #MinuteData; 

值跟蹤貨幣匯率,所以爲每分鐘間隔(條),分鐘開始時的價格爲Open,分鐘結束時的價格爲CloseHighLow值表示每個單獨分鐘期間的最高和最低速率。

所需的輸出

我正在尋找在以5分鐘的間隔,以重新格式化該數據以產生以下輸出:

MinuteBar    Open  Close  Low   High 
2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100 
2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710 

這從5的第一分鐘取Open值,來自5的最後一分鐘的Close值。該HighLow值表示在5分鐘期間內的最高high和最低low率。

目前的解決方案

我做這個(下圖)的解決方案,但因爲它依賴於id價值觀和自我加入感覺不雅。另外,我打算在更大的數據集上運行它,所以我一直在尋找這樣做更有效的方式,如果可能的:

-- Create a column to allow grouping in 5 minute Intervals 
SELECT Id, MinuteBar, [Open], High, Low, [Close], 
DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval 
INTO #5MinuteData 
FROM #MinuteData 
ORDER BY minutebar 

-- Group by inteval and aggregate prior to self join 
SELECT Interval , 
     MIN(MinuteBar) AS MinuteBar , 
     MIN(Id) AS OpenId , 
     MAX(Id) AS CloseId , 
     MIN(Low) AS Low , 
     MAX(High) AS High 
INTO #DataMinMax 
FROM #5MinuteData 
GROUP BY Interval; 

-- Self join to get the Open and Close values 
SELECT t1.Interval , 
     t1.MinuteBar , 
     tOpen.[Open] , 
     tClose.[Close] , 
     t1.Low , 
     t1.High 
FROM #DataMinMax t1 
     INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId 
     INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId; 

DROP TABLE #DataMinMax 
DROP TABLE #5MinuteData 

返修嘗試

代替上述的查詢,我已經一直在尋找使用FIRST_VALUELAST_VALUE,因爲它似乎是我所追求的,但我無法完全理解我正在使用的分組。可能有比我想要做的更好的解決方案,所以我願意接受建議。目前我正在努力做到這一點:

SELECT MIN(MinuteBar) MinuteBar5 , 
     FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening, 
     MAX(High) AS High , 
     MIN(Low) AS Low , 
     LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing , 
     DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar)/5 AS Interval 
FROM #MinuteData 
GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar)/5 

這給了我下面的錯誤,這是關係到FIRST_VALUELAST_VALUE作爲查詢運行,如果我刪除這些行:

列「# MinuteData.MinuteBar'在選擇列表中無效,因爲它不包含在聚合函數或GROUP BY子句中。

+1

的FIRST_VALUE和LAST_VALUE實際上不是聚合函數像你想象的。它們更像row_number,它們被放在一個完整的數據集上。問題是你想要像聚合物一樣使用它們,這就是爲什麼它會對你大喊大叫。我現在必須出頭,但我的第一個想法是將日期轉換爲一個字符串,將分鐘組件串出並將它們以圓形形式粘貼在一起。 – Xedni

+0

感謝您的回覆,我之前沒有在憤怒中使用過first_value。日期在這裏並不是我的問題,我有一個解決方案,似乎正在工作,雖然可能有更好的方法來做到這一點。主要問題是獲得5分鐘期間的開盤價和收盤價。 – Tanner

+0

所以高和低分別是最高和最低,但是'open'和'close'是你遇到的問題,因爲那些應該是間隔中的第一個和最後一個,而不管它們的值如何?我有這個權利嗎? – Xedni

回答

2
SELECT 
    MIN(MinuteBar) AS MinuteBar5, 
    Opening, 
    MAX(High) AS High, 
    MIN(Low) AS Low, 
    Closing, 
    Interval 
FROM 
(
    SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar)/5 ORDER BY MinuteBar) AS Opening, 
      FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar)/5 ORDER BY MinuteBar DESC) AS Closing, 
      DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar)/5 AS Interval, 
      * 
    FROM #MinuteData 
) AS T 
GROUP BY Interval, Opening, Closing 

一個接近目前的一個解決方案。有兩個地方你做錯了。

  1. FIRST_VALUE和LAST_VALUE是解析函數,其中一個窗口或分區上工作,而不是一組。您可以單獨運行嵌套查詢並查看其結果。
  2. LAST_VALUE是當前窗口,這是不是在你的查詢中指定的最後一個值,並且默認的窗口是從當前分區的第一行當前行行。您可以使用FIRST_VALUE與去籽順序或指定窗口

    LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar)/5 
          ORDER BY MinuteBar 
          ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing, 
    
+0

,稍後回覆謝謝,我會盡快回復並回復您。我懷疑我可能需要做這樣的事情。 – Tanner

+0

不錯。我並沒有想到在group by子句中加入Opening和Closing。 –

+0

這似乎是最簡單的解決方案,不需要像其他人那樣多的步驟,並且最接近我嘗試實現的目標,儘管我仍在觀察爲什麼LAST_VALUE部分無法正常工作。儘管如此,這個工作非常感謝。 – Tanner

1

這裏是做纔不至於臨時表的一種方法:

;WITH CTEInterval AS 
( -- This replaces your first temporary table (#5MinuteData) 
    SELECT [Id], 
      [MinuteBar], 
      [Open], 
      [High], 
      [Low], 
      [Close], 
      DATEPART(MINUTE, MinuteBar)/5 AS Interval 
    FROM #MinuteData 
), CTEOpenClose as 
(-- this is instead of your second temporary table (#DataMinMax) 
    SELECT [Id], 
      [MinuteBar], 
      FIRST_VALUE([Open]) OVER (PARTITION BY Interval ORDER BY MinuteBar) As [Open], 
      [High], 
      [Low], 
      FIRST_VALUE([Close]) OVER (PARTITION BY Interval ORDER BY MinuteBar DESC) As [Close], 
      Interval 
    FROM CTEInterval 
) 

-- This is the final select 
SELECT MIN([MinuteBar]) as [MinuteBar], 
     AVG([Open]) as [Open], -- All values of [Open] in the same interval are the same... 
     AVG([Close]) as [Close], -- All values of [Close] in the same interval are the same... 
     MIN([Low]) as [Low], 
     MAX([High]) as [High] 
FROM CTEOpenClose 
GROUP BY Interval 

結果:

MinuteBar    Open  Close  Low   High 
2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100 
2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710 
+0

謝謝,我現在正在開會,很快就會進行測試。我希望能夠減少步驟數量,並希望能夠在大約600,000多條記錄上表現出色。稍後我會測試 – Tanner

1

Demo here

;with cte 
as 
(--this can be your permanent table with intervals ,rather than generating on fly 
select cast('2015-01-01 17:00:00.000' as datetime) as interval,dateadd(mi,5,'2015-01-01 17:00:00.000') as nxtinterval 
union all 
select dateadd(mi,5,interval),dateadd(mi,5,nxtinterval) from cte 
where interval<='2015-01-01 17:45:00.000' 

) 
,finalcte 
as 
(select minutebar, 
low,high, 
dense_rank() over (order by interval,nxtinterval) as grpd, 
last_value([close]) over (partition by interval,nxtinterval order by interval,nxtinterval) as [close], 
first_value([open]) over (partition by interval,nxtinterval order by interval,nxtinterval) as [open] 
from cte c 
join 
#minutedata m 
on m.minutebar between interval and nxtinterval 
) 
select 
min(minutebar) as minutebar, 
min(low) as 'low', 
max(high) as 'High', 
max([open]) as 'open', 
max([close]) as 'close' 
from finalcte 
group by grpd 
+0

你是如何得到「Demo here」按鈕的? – Xedni

+1

@Xedni:使用保留一些文字 TheGameiswar