2015-08-25 80 views
3

我有這樣的SQL服務器 - 複雜GROUP BY - 差距和羣島

+-------------+--------------+------------+----------------+ 
| CustomerSID | StartDateSID | EndDateSID | MarketingOptIn | 
+-------------+--------------+------------+----------------+ 
|  12345 |  20101019 | 20131016 | Y    | 
|  12345 |  20131017 | 20140413 | Y    | 
|  12345 |  20140414 | 20140817 | N    | 
|  12345 |  20140818 | 20141228 | N    | 
|  12345 |  20141229 | 20150125 | Y    | 
|  12345 |  20150126 |   0 | Y    | 
+-------------+--------------+------------+----------------+ 

我需要在這個表必須在以下格式進行格式化的標誌數據之上創建視圖的數據表,基本上時間,爲的標誌是Y或N. (EndDateSID - 0是當前活動的,所以今天的日期)

+-------------+--------------+------------+----------------+ 
| CustomerSID | StartDateSID | EndDateSID | MarketingOptIn | 
+-------------+--------------+------------+----------------+ 
|  12345 |  20101019 | 20140413 | Y    | 
|  12345 |  20140414 | 20141228 | N    | 
|  12345 |  20141229 | 20150825 | Y    | 
+-------------+--------------+------------+----------------+ 

大多數客戶只需要在他們的標誌改變一次,因此下面的查詢工作:

SELECT 
CH1.CustomerSID 
,MIN(CH1.StartDateSID) StartDate 
,MAX(ISNULL(NULLIF(CH1.EndDateSID,0),CONVERT(INT, CONVERT(VARCHAR, GETDATE(), 112)))) EndDate 
,CH1.MarketingOptIn 
FROM DWH.DimCustomerHistory CH1 
GROUP BY CH1.CustomerSID, CH1.MarketingOptIn 
ORDER BY CH1.CustomerSID, CH1.MarketingOptIn 

我如何才能達到預期的輸出效果,像上面那樣的客戶,不止一次改變了旗幟?

編輯:按照@ GarethD的建議,修改標題使其他人更容易搜索。

+0

您好羅希特歡迎StackOverflow上,下一次嘗試提供一個 [** ** SqlFiddle ](http://sqlfiddle.com/#!15/5368b/6),所以我們可以更好地理解問題,並給你一個答案 快得多 - 也請閱讀[**如何問**](http:///stackoverflow.com/help/how-to-ask)另請閱讀 [**如何創建一個最小,完整和可驗證的示例。**](http://stackoverflow.com/help/mcve) –

+0

可能的重複[Gro通過改變分組列值來排序數據](http://stackoverflow.com/questions/10110026/group-data-by-the-change-of-grouping-column-value-in-order) – Bulat

+0

Hi @ JuanCarlosOropeza,謝謝你的建議,下次我會跟着他們。 – Rohit

回答

3

這是一個gaps and islands problem。您需要使用ROW_NUMBER()識別您的差距,所以一開始階段是:

SELECT CustomerSID, 
     StartDateSID, 
     EndDateSID, 
     MarketingOptIn, 
     TotalRowNum = ROW_NUMBER() OVER(PARTITION BY CustomerSID ORDER BY StartDateSID), 
     RowNumInGroup = ROW_NUMBER() OVER(PARTITION BY CustomerSID, MarketingOptIn ORDER BY StartDateSID), 
     GroupID = ROW_NUMBER() OVER(PARTITION BY CustomerSID ORDER BY StartDateSID) - 
       ROW_NUMBER() OVER(PARTITION BY CustomerSID, MarketingOptIn ORDER BY StartDateSID) 
FROM dbo.YourTable; 

輸出:

CustomerSID StartDateSID EndDateSID MarketingOptIn TotalRowNum RowNumInGroup GroupID 
--------------------------------------------------------------------------------------------------- 
12345  20101019  20131016 Y    1   1    0 
12345  20131017  20140413 Y    2   2    0 
12345  20140414  20140817 N    3   1    2 
12345  20140818  20141228 N    4   2    2 
12345  20141229  20150125 Y    5   3    2 
12345  20150126  0   Y    6   4    2 

這裏的關鍵是,通過採取每行的行號,並且還包含該組的每行的行號,您可以獲得一個唯一標識符(GroupID + MarketingOptIn),用於標識您的每個島嶼。然後做你的總量時只是分組的情況下,通過該標識符:

FULL工作例

DECLARE @T TABLE 
( 
    CustomerSID INT, 
    StartDateSID INT, 
    EndDateSID INT, 
    MarketingOptIn CHAR(1) 
) 
INSERT @T 
VALUES 
    (12345, 20101019, 20131016, 'Y'), 
    (12345, 20131017, 20140413, 'Y'), 
    (12345, 20140414, 20140817, 'N'), 
    (12345, 20140818, 20141228, 'N'), 
    (12345, 20141229, 20150125, 'Y'), 
    (12345, 20150126, 0, 'Y'); 


WITH CTE AS 
(
    SELECT CustomerSID, 
      StartDateSID, 
      EndDateSID, 
      MarketingOptIn, 
      GroupID = ROW_NUMBER() OVER(PARTITION BY CustomerSID ORDER BY StartDateSID) - 
        ROW_NUMBER() OVER(PARTITION BY CustomerSID, MarketingOptIn ORDER BY StartDateSID) 
    FROM @T 
) 
SELECT CustomerSID, 
     StartDateSID = MIN(StartDateSID), 
     EndDateSID = CASE WHEN MIN(EndDateSID) = 0 THEN CONVERT(INT, CONVERT(VARCHAR(8), GETDATE(), 112)) ELSE MAX(EndDateSID) END, 
     MarketingOptIn 
FROM CTE 
GROUP BY CustomerSID, MarketingOptIn, GroupID 
ORDER BY CustomerSID, StartDateSID; 
5

您可以使用下面的查詢:

SELECT CustomerSID, 
     MIN(StartDateSID) AS StartDate, 
     MAX(ISNULL(NULLIF(EndDateSID,0), 
      CONVERT(INT, CONVERT(VARCHAR, GETDATE(), 112)))) AS EndDate, 
     MarketingOptIn 
FROM (  
    SELECT CustomerSID, StartDateSID, EndDateSID, MarketingOptIn, 
     ROW_NUMBER() OVER (ORDER BY StartDateSID) - 
     ROW_NUMBER() OVER (PARTITION BY CustomerSID, MarketingOptIn 
          ORDER BY StartDateSID) AS grp  
    FROM DimCustomerHistory) AS t 
GROUP BY CustomerSID, MarketingOptIn, grp 
ORDER BY StartDate 

計算字段grp用來識別連續記錄具有相同MarketingOptIn值。

在外部查詢中使用此字段,您可以輕鬆地GROUP BY並以與原始查詢類似的方式應用MINMAX聚合函數。

Demo here