2014-06-23 36 views
4

與定義爲任意間隔(NOT日期/時間!!)的表給定的數據如下:總結區間數據的最佳方法是什麼?

START float 
END float 
VALUE varchar(40) 

例如

START END VALUE 
----- --- ------ 
0  1  Banana 
1  3  Banana 
3  4  Orange 
4  7  Orange 
7  8  Apple 
8  9  Apple 
9  10  Apple 
10  15  Apple 
20  22  Apple 
22  23  Apple 
23  28  Banana 
28  30  Banana 
etc.. 

如何對數據進行彙總,以便對於連續間隔僅列出一個值。即查詢的結果應該如下所示:

START  END VALUE 
-----  --- ------ 
0  3  Banana 
3  7  Orange 
7  15  Apple 
20  23  Apple 
23  30  Banana 

請注意上面15和20之間的差距。我正在處理相當多的數據(〜500k行),但沒有頻繁地運行查詢。所以效率是很好的。這可以在不使用遊標的情況下完成嗎?

(注:使用SQL2008R2所以不能充分利用新功能,如果存在的話)

謝謝!

+0

這就是爲什麼這個網站仍然是我去的地方。幾個小時後,我有幾個很好的迴應。謝謝大家!我會試着圍繞不同的選項進行包裝,並會很快接受相應的提議。 –

+0

不能同意更多,把我的答案放在一邊,其他人都很棒。 – JiggsJedi

+0

所有好的答案,謝謝!我沒有資格評論各種選擇的優雅/效率,所以我接受了@ adrianm的答案,因爲我可以適應我的需求,他的工作速度最快。 –

回答

1
WITH TableWithPreviousAndNext AS (
    SELECT CA1.[Previous] 
      ,Table1.[Start] 
      ,Table1.[End] 
      ,CA2.[Next] 
      ,Table1.[Value] 
      ,(1 + ROW_NUMBER() OVER (PARTITION BY [Value] ORDER BY Table1.[Start]))/2 AS [Group] 
    FROM Table1 
     CROSS APPLY (
      SELECT MAX([End]) AS [Previous] 
      FROM Table1 AS InnerTable1 
      WHERE InnerTable1.[Value] = Table1.[Value] 
        AND InnerTable1.[Start] < Table1.[Start] 
     ) AS CA1 
     CROSS APPLY (
      SELECT MIN([Start]) AS Next 
      FROM Table1 AS InnerTable1 
      WHERE InnerTable1.[Value] = Table1.[Value] 
        AND InnerTable1.[Start] > Table1.[Start] 
     ) AS CA2 
     CROSS APPLY (-- A little trick to create a 2 row group for isolated rows 
      SELECT 1 AS Dummy 
      UNION ALL 
      SELECT 1 
      WHERE ([Previous] IS NULL OR [Previous] <> [Start]) 
        AND ([Next] IS NULL OR [Next] <> [End]) 
     ) AS CA3 
    WHERE [Previous] IS NULL -- Remove all but first and last in sequence 
      OR [Next] IS NULL 
      OR [Previous] <> [Start] 
      OR [End] <> [Next] 
) 
SELECT MIN([Start]) 
     ,MAX([End]) 
     ,[Value] 
FROM TableWithPreviousAndNext 
GROUP BY [Value] 
     ,[Group] 
ORDER BY MIN(Start) 
3

這應該爲你工作:

DECLARE @T TABLE (Start INT, [End] INT, Value VARCHAR(100)); 
INSERT @T (Start, [End], Value) 
VALUES 
    (0, 1, 'Banana'), (1, 3, 'Banana'), (3, 4, 'Orange'), (4, 7, 'Orange'), 
    (7, 8, 'Apple'), (8, 9, 'Apple'), (9, 10, 'Apple'), (10, 15, 'Apple'), 
    (20, 22, 'Apple'), (22, 23, 'Apple'), (23, 28, 'Banana'), (28, 30, 'Banana'); 

WITH CTE AS 
( SELECT t.[Start], 
      t.[End], 
      t.[value], 
      IsStart = ISNULL(c.IsStart, 1) 
    FROM @T AS T 
      OUTER APPLY 
      ( SELECT TOP 1 IsStart = 0 
       FROM @T AS T2 
       WHERE T2.Value = T.Value 
       AND  T2.[End] = T.Start 
      ) AS c 
) 
SELECT Value, Start = MIN(Start), [End] = MAX([End]) 
FROM CTE AS T 
     OUTER APPLY 
     ( SELECT SUM(IsStart) 
      FROM CTE AS T2 
      WHERE T2.Value = T.Value 
      AND  T2.Start <= T.Start 
     ) g (GroupingSet) 
GROUP BY Value, GroupingSet 
ORDER BY Start; 

的第一步是確定每一個是新範圍的開始記錄。這一部分:

SELECT t.[Start], 
     t.[End], 
     t.[value], 
     IsStart = ISNULL(c.IsStart, 1) 
FROM @T AS T 
     OUTER APPLY 
     ( SELECT TOP 1 IsStart = 0 
      FROM @T AS T2 
      WHERE T2.Value = T.Value 
      AND  T2.[End] = T.Start 
     ) AS c 

會給:

Start End value IsStart 
0  1 Banana 1 
1  3 Banana 0 
3  4 Orange 1 
4  7 Orange 0 
7  8 Apple 1 
8  9 Apple 0 
9  10 Apple 0 
10  15 Apple 0 
20  22 Apple 1 

然後你就可以通過識別當前記錄之前啓動的範圍數量創造獨特的羣體,基本上將運行總計被分割的IsStart列值。這是正在這裏進行:

SELECT * 
FROM CTE AS T 
     OUTER APPLY 
     ( SELECT SUM(IsStart) 
      FROM CTE AS T2 
      WHERE T2.Value = T.Value 
      AND  T2.Start <= T.Start 
     ) g (GroupingSet); 

,並提供:

Start End value IsStart GroupingSet 
0  1 Banana 1  1 
1  3 Banana 0  1 
3  4 Orange 1  1 
4  7 Orange 0  1 
7  8 Apple 1  1 
8  9 Apple 0  1 
9  10 Apple 0  1 
10  15 Apple 0  1 
20  22 Apple 1  2 -- SECOND NON CONTINUOUS RANGE FOR APPLES 
22  23 Apple 0  2 
23  28 Banana 1  2 -- SECOND NON CONTINUOUS RANGE FOR BANANAS 
28  30 Banana 0  2 

最後,您可以通過彙總值分組,而這個標識符列,以確定唯一的組。

您還可以通過擴展每個範圍伸到行通過交叉連結到數表(爲簡便起見我用master..spt_values)做到這一點:

WITH CTE AS 
( SELECT t.[value], 
      Number = t.Start + v.Number, 
      GroupingSet = t.Start + v.Number - ROW_NUMBER() OVER(PARTITION BY t.[value] ORDER BY t.Start + v.Number) 
    FROM @T AS T 
      INNER JOIN Master..spt_values v 
       ON v.[Type] = 'P' 
       AND v.Number < (t.[End] - t.[Start]) 
) 
SELECT Value, [Start] = MIN(Number), [End] = MAX(Number) 
FROM CTE 
GROUP BY GroupingSet, Value; 

這樣做的垮臺是,它很可能是如果你有很多的行/大範圍,那麼內存密集。擴大範圍後,這只是使用使用Itzik Ben-Gan's Gaps and Islands Solutions

+1

希望我可以+2的好解釋。 –

1

我有在這之後頭痛描述排名函數的方法...

我不能沒有數據擴展到相鄰的排摸出一個空白/孤島技術級數據。

這裏是我的解決方案:

DECLARE @Fruits TABLE ([Start] FLOAT, [End] FLOAT, Value NVARCHAR(MAX)) 
INSERT INTO @Fruits 
SELECT 0,1,'Banana' UNION 
SELECT 1,3,'Banana' UNION 
SELECT 3,4,'Orange' UNION 
SELECT 4,7,'Orange' UNION 
SELECT 7,8,'Apple' UNION 
SELECT 8,9,'Apple' UNION 
SELECT 9,10,'Apple' UNION 
SELECT 10,15,'Apple' UNION 
SELECT 20,22,'Apple' UNION 
SELECT 22,23,'Apple' UNION 
SELECT 23,28,'Banana' UNION 
SELECT 28,30,'Banana' 

;WITH ExpandCTE AS 
(
    SELECT 1 AS SPLITNUM, 
      [End]-Start DURATION, 
      Start, 
      Start+1 AS [End], 
      Value 
    FROM @Fruits 
    UNION ALL 
    SELECT SPLITNUM+1, 
      DURATION, 
      Start+1 AS Start, 
      Start+2 AS [End], 
      Value 
    FROM ExpandCTE 
    WHERE SPLITNUM<DURATION 
), 
t1 AS 
(
    SELECT *, 
      START-ROW_NUMBER() OVER(PARTITION BY VALUE ORDER BY START) AS X 
    FROM ExpandCTE 
) 

select MIN(Start) AS Start, MAX([End]) AS [End], Value 
from t1 
GROUP BY Value, X 
ORDER BY Start 
+0

這是什麼'CASE WHEN [End] --Start = 1 THEN 1 ELSE [End] - 開始END DURATION,'? –

+0

測試中殘留的​​垃圾!更新。 – JiggsJedi

1

隨着SQLServer的2008年一個這樣做是利用三角形的方式加入,一點點扭曲

WITH I AS (
    SELECT ID = Row_Number() OVER (ORDER BY Start) 
     , _Start = [Start] 
     , _End = [End] 
     , Value 
    FROM Data 
), D AS (
    SELECT i.ID, i._Start, i._End, i.Value 
     , m.id _id, m.value _value 
     , R = CASE WHEN i.Value <> m.Value THEN 1 
        WHEN m._End <> i._Start THEN 1 
        ELSE 0 
      END 
    FROM I 
     CROSS APPLY (SELECT TOP 1 
          id, _Start, _End, value 
         FROM I m 
         WHERE m.ID IN (i.ID, i.ID - 1) 
         ORDER BY ID) m 
), B AS (
    SELECT i.ID, i._Start, i._End, i.Value 
     , R = SUM(l.R) 
    FROM D i 
     LEFT JOIN D l ON i.id >= l.id 
    GROUP BY i.ID, i._Start, i._End, i.Value 
) 
SELECT [START] = MIN(_Start) 
    , [END] = MAX(_End) 
    , Value 
FROM B 
GROUP BY R, Value 
ORDER BY 1 

SQLFiddle Demo

CTEI(ID)cr提供了一個ID,這需要檢查後面兩行之間是否有間隔(該ID用於獲取JOIN中的正確行)。

CTED(數據)使用CROSS APPLY獲得前一行(或同一行的第一個),這是相同的LAG,上一行的值進行檢查,看是否Value是或者前一個的和[END]之間存在差距。

CTEB(塊)使用D之間的三角形JOIN和本身來創建,其中被存儲的變化,並從開始到當前行間隙的數量的字段,該字段具有相同數量的對同一數據組。

主查詢使用新列來聚合數據。

相關問題