2016-03-15 87 views
0

在Redshift中,通過SQL腳本只要將結束日期與下一條記錄的結束日期之間的差距爲32天或更短(< = 32)合併爲單個以連續月份的最小開始日期作爲輸出開始日期和最大結束日期作爲輸出結束日期進行記錄。開始日期結束日期合併行

下面的輸入數據是指表中的數據,也列出了預期的輸出。輸入數據列於ORDER BY ID,STARTDT,ENDDT in ASC

例如,在下表中,考慮ID 100,第一個記錄的結尾和下一個記錄的開始之間的gab < = 32,但是第二個記錄結束日期和第三個記錄開始日期之間的差距大於32天,因此前兩個記錄合併爲一個記錄,即(ID),MIN(STARTSDT),MAX(ENDDT),其對應於預期輸出中的第一記錄。同樣,輸入數據中的3到4條記錄之間的gab落在32天內,因此這2條記錄將合併爲單個記錄,這對應於預期輸出中的第二條記錄。

輸入數據:

ID STARTDT ENDDT 
100 2000-01-01 2000-01-31 
100 2000-02-01 2000-02-29 
100 2000-05-01 2000-05-31 
100 2000-06-01 2000-06-30 
100 2000-09-01 2000-09-30 
100 2000-10-01 2000-10-31 
101 2012-06-01 2012-06-30 
101 2012-07-01 2012-07-31 
102 2000-01-01 2000-01-31 
103 2013-03-01 2013-03-31 
103 2013-05-01 2013-05-31 

預期輸出:

ID MIN_STARTDT MAX_END_DT 
100 2000-01-01 2000-02-29 
100 2000-05-01 2000-06-30 
100 2000-09-01 2000-10-31 
101 2012-06-01 2012-07-31 
102 2000-01-01 2000-01-31 
103 2013-03-01 2013-03-31 
103 2013-05-01 2013-05-31 

回答

0

可以在步驟做到這一點:

  • 使用join找出兩個相鄰的記錄應該合併。
  • 然後做一個累計總和爲所有這樣的相鄰記錄分配一個分組標識符。
  • 總計。

它看起來像:

select id, min(startdt), max(enddte) 
    from (select t.*, 
       count(case when tprev.id is null then 1 else 0 end) over 
        (partition by t.idid 
         order by t.startdt 
         rows between unbounded preceding and current row 
        ) as grp 
     from t left join 
      t tprev 
      on t.id = tprev.id and 
       t.startdt = tprev.enddt + interval '1 day' 
     ) t 
    group by id, grp; 
+0

查詢不能正常工作。「00918. 00000 - ‘含糊不清的’行6 –

+0

以上查詢是不會放棄希望的結果列 –

0

的問題是非常相似,這一次,我的回答也差不多:Fetch rows based on condition

理念的要點是使用窗口功能來識別轉換之間的時間間隔(間隔小於33天的事件),然後執行一些過濾以刪除期間內的行,然後再次顯示窗口函數。

完整的解決方案:

SELECT 
    id, 
    startdt AS period_start, 
    period_end 
FROM (
    SELECT 
    id, 
    startdt, 
    enddt, 
    lead(enddt, 1) 
    OVER (PARTITION BY id 
     ORDER BY enddt) AS period_end, 
    period_boundary 
    FROM (
     SELECT 
      id, 
      startdt, 
      enddt, 
      CASE WHEN period_switch = 0 AND reverse_period_switch = 1 
      THEN 'start' 
      ELSE 'end' END AS period_boundary 
     FROM (
       SELECT 
        id, 
        startdt, 
        enddt, 
        CASE WHEN datediff(days, enddt, lead(startdt, 1) 
        OVER (PARTITION BY id 
        ORDER BY enddt ASC)) > 32 
        THEN 1 
        ELSE 0 END AS period_switch, 
        CASE WHEN datediff(days, lead(enddt, 1) 
        OVER (PARTITION BY id 
        ORDER BY enddt DESC), startdt) > 32 
        THEN 1 
        ELSE 0 END AS reverse_period_switch 
       FROM date_test 
      ) 
      AS sessioned 
     WHERE period_switch != 0 OR reverse_period_switch != 0 
     UNION 
     SELECT -- adding start rows without transition 
      id, 
      startdt, 
      enddt, 
      'start' 
     FROM (
       SELECT 
        id, 
        startdt, 
        enddt, 
        row_number() 
        OVER (PARTITION BY id 
        ORDER BY enddt ASC) AS row_num 
       FROM date_test 
      ) AS with_row_number 
     WHERE row_num = 1 
     UNION 
     SELECT -- adding end rows without transition 
      id, 
      startdt, 
      enddt, 
      'end' 
     FROM (
       SELECT 
        id, 
        startdt, 
        enddt, 
        row_number() 
        OVER (PARTITION BY id 
        ORDER BY enddt desc) AS row_num 
       FROM date_test 
      ) AS with_row_number 
     WHERE row_num = 1 
     ) AS with_boundary -- data set containing start/end boundaries 
) AS with_end -- data set where end date is propagated into the start row of the period 
WHERE period_boundary = 'start' 
ORDER BY id, startdt ASC; 

,在你期望的輸出,你有一排103 2013-05-01 2013-05-31,但其開始日期是除了31天上一行的結束日期,所以此行應而是根據您的要求將其與上一行的編號103合併。

所以輸出,我得到這個樣子的:

id start  end 
100 2000-01-01 2000-02-29 
100 2000-05-01 2000-06-30 
100 2000-09-01 2000-10-31 
101 2012-06-01 2012-07-31 
102 2000-01-01 2000-01-31 
103 2013-03-01 2013-05-31 
相關問題