2014-02-27 36 views
4

我有一系列記錄包含一些具有時間有效性的信息(產品類型)。將連續的日期有效間隔連接在一起

如果分組信息(產品類型)保持不變,我想將相鄰的有效間隔合併在一起。我不能使用簡單的GROUP BYMINMAX,因爲某些產品類型(例如A)可以「消失」和「回來」。

使用Oracle 11g。

爲MySQL類似的問題是:How can I do a contiguous group by in MySQL?

Input data

| PRODUCT |      START_DATE |       END_DATE | 
|---------|----------------------------------|----------------------------------| 
|  A |  July, 01 2013 00:00:00+0000 |  July, 31 2013 00:00:00+0000 | 
|  A | August, 01 2013 00:00:00+0000 | August, 31 2013 00:00:00+0000 | 
|  A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 | 
|  B | October, 01 2013 00:00:00+0000 | October, 31 2013 00:00:00+0000 | 
|  B | November, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 | 
|  A | December, 01 2013 00:00:00+0000 | December, 31 2013 00:00:00+0000 | 
|  A | January, 01 2014 00:00:00+0000 | January, 31 2014 00:00:00+0000 | 
|  A | February, 01 2014 00:00:00+0000 | February, 28 2014 00:00:00+0000 | 
|  A |  March, 01 2014 00:00:00+0000 |  March, 31 2014 00:00:00+0000 | 

Expected results

| PRODUCT |      START_DATE |       END_DATE | 
|---------|---------------------------------|----------------------------------| 
|  A |  July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 | 
|  B | October, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 | 
|  A | December, 01 2013 00:00:00+0000 |  March, 31 2014 00:00:00+0000 | 

請參閱完整SQL Fiddle

+0

它的使用任何腳本語言更容易做到這一點 – Alexander

+0

我認爲它可以通過分析函數解決。 –

回答

6

這是一個缺口和孤島問題。有很多方法可以解決這個問題。這使用leadlag解析函數:

select distinct product, 
    case when start_date is null then lag(start_date) 
    over (partition by product order by rn) else start_date end as start_date, 
    case when end_date is null then lead(end_date) 
    over (partition by product order by rn) else end_date end as end_date 
from (
    select product, start_date, end_date, rn 
    from (
    select t.product, 
     case when lag(end_date) 
      over (partition by product order by start_date) is null 
     or lag(end_date) 
      over (partition by product order by start_date) != start_date - 1 
     then start_date end as start_date, 
     case when lead(start_date) 
      over (partition by product order by start_date) is null 
     or lead(start_date) 
      over (partition by product order by start_date) != end_date + 1 
     then end_date end as end_date, 
     row_number() over (partition by product order by start_date) as rn 
    from t 
) 
    where start_date is not null or end_date is not null 
) 
order by start_date, product; 

PRODUCT START_DATE END_DATE 
------- ---------- --------- 
A  01-JUL-13 30-SEP-13 
B  01-OCT-13 30-NOV-13 
A  01-DEC-13 31-MAR-14 

SQL Fiddle

最裏面的查詢看起來在前面和後面的記錄的產品,並且只保留的開始和/或結束時間,如果記錄是不連續的:

select t.product, 
    case when lag(end_date) 
     over (partition by product order by start_date) is null 
    or lag(end_date) 
     over (partition by product order by start_date) != start_date - 1 
    then start_date end as start_date, 
    case when lead(start_date) 
     over (partition by product order by start_date) is null 
    or lead(start_date) 
     over (partition by product order by start_date) != end_date + 1 
    then end_date end as end_date 
from t; 

PRODUCT START_DATE END_DATE 
------- ---------- --------- 
A  01-JUL-13    
A        
A     30-SEP-13 
A  01-DEC-13    
A        
A        
A     31-MAR-14 
B  01-OCT-13    
B     30-NOV-13 

的選擇下一個電平來移除那些中間週期,其中兩個日期是由內部查詢消隱,其給出:

PRODUCT START_DATE END_DATE 
------- ---------- --------- 
A  01-JUL-13    
A     30-SEP-13 
A  01-DEC-13    
A     31-MAR-14 
B  01-OCT-13    
B     30-NOV-13 

外部查詢然後摺疊這些相鄰的對;我已經使用了創建重複的簡單路線,然後用distinct消除它們,但是您可以通過其他方式來完成,比如將兩個值放入其中一個行中,並將兩個值留在另一個null中,然後刪除這些值另一層選擇,但我認爲在這裏不同。

如果您的現實世界用例有時間,而不僅僅是日期,那麼您需要調整內部查詢中的比較;而不是+/- 1,或許是1秒的間隔,或者如果您願意,也可以是1/86400,但取決於您的值的精確度。

+0

「空隙和島嶼」。現在我可以給這個問題一個名字。謝謝! –

+0

@DaniloPiazzalunga - 是的,我應該真的添加該標籤;如果你搜索下面有大約180個問題,那麼你可能會得到一些其他的想法和方法。 –

-1

試着這麼做:

with dat as (
select 'A' as product, sysdate-3 as start_dte, sysdate-2 as end_dte from dual 
union all 
select 'A' as product, sysdate-2 as start_dte, sysdate-1 as end_dte from dual 
union all 
select 'B' as product, sysdate-5 as start_dte, sysdate-4 as end_dte from dual 
) 
SELECT product, 
     MIN(start_dte) KEEP (DENSE_RANK FIRST ORDER BY start_dte) "Start", 
     MAX(end_dte) KEEP (DENSE_RANK LAST ORDER BY end_dte) "End" 
    FROM dat 
    GROUP BY product 
    ORDER BY product; 

輸出

PRODUCT Start End 
A 2/24/2014 10:25:53 AM 2/26/2014 10:25:53 AM 
B 2/22/2014 10:25:53 AM 2/23/2014 10:25:53 AM 
+0

這將不起作用:使用原始數據集,它將只返回一行產品A:[http://sqlfiddle.com/#!4/6d1e6/3] –

+0

對不起,我想我誤解了你想要的東西 – tbone

0

這是一個相當複雜的一系列步驟,但它是我解決了類似的問題的方法:

-- Sample Data 
CREATE TABLE AdjacentValidity 
    (
RowID INT IDENTITY(1,1) NOT NULL, 
Product VARCHAR(1) NOT NULL, 
Start_Date DATETIME NOT NULL, 
End_Date DATETIME NOT NULL 
) 

INSERT INTO AdjacentValidity (Product, Start_Date, End_Date) 

SELECT 'A', '7/1/2013', '7/31/2013' UNION 
SELECT 'A', '8/1/2013', '8/31/2013' UNION 
SELECT 'A', '9/1/2013', '9/30/2013' UNION 
SELECT 'B', '10/1/2013', '10/31/2013' UNION 
SELECT 'B', '11/1/2013', '11/30/2013' UNION 
SELECT 'A', '12/1/2013', '12/31/2013' UNION 
SELECT 'A', '1/1/2014', '1/31/2014' UNION 
SELECT 'A', '2/1/2014', '2/28/2014' UNION 
SELECT 'A', '3/1/2014', '3/31/2014' 


-- Modify the sample data to include necessary tags 
CREATE TABLE #RawData 
    (
    RawData_ID INT IDENTITY(1,1) NOT NULL, 
    Product VARCHAR(1) NOT NULL, 
    Start_Date DATETIME NOT NULL, 
    End_Date DATETIME NOT NULL, 
    isFirstOccurrence BIT NULL, 
    isLastOccurrence BIT NULL, 
    isFirstInstance BIT NULL, 
    isLastInstance BIT NULL 
) 

-- Load and flag first occurrences of a natural key 
INSERT INTO #RawData 
    (
    Product, 
    Start_Date, 
    End_Date, 
    isFirstInstance 
) 
SELECT 
    Product, 
    Start_Date, 
    End_Date, 
    CASE WHEN ROW_NUMBER() OVER 
     (
     --PARTITION BY <NaturalKey> 
     ORDER BY Start_date 
    ) = 1 THEN 1 ELSE 0 END AS isFirstOccurrence 
FROM AdjacentValidity 

-- update to flag the last sequential instance of a particalar data set, and the last  occurrence of a natural key 
UPDATE a 
SET 
    a.isLastInstance = 
    CASE 
     WHEN 
     a.Product <> b.Product OR 
     DATEADD(m, 1, a.Start_Date) <> b.Start_Date OR 
     b.RawData_ID IS NULL 
     THEN 1 
     ELSE 0 
    END, 
    a.isLastOccurrence = 
    CASE 
     WHEN 
     b.RawData_ID IS NULL 
     THEN 1 
     ELSE 0 
    END 
FROM 
    #RawData a 
    LEFT JOIN 
    #RawData b ON 
     b.RawData_ID = a.RawData_ID + 1 --AND 
     --b.<NaturalKey> = a.<NaturalKey> 

-- flag first sequential instance of a particular data set 
UPDATE b 
SET 
    b.isFirstInstance = 
    CASE 
     WHEN 
     a.isLastInstance = 1 
     THEN 1 
     ELSE 0 
    END 
FROM 
    #RawData a 
    LEFT JOIN 
    #RawData b ON 
     b.RawData_ID = a.RawData_ID + 1 --AND 
     --b.<NaturalKey> = a.<NaturalKey> 


-- reduce the records to only those that are the first or last occurrence of a  particular data set 
CREATE TABLE #UniqueData 
    (
    [UniqueData_ID] [int] IDENTITY(1,1) NOT NULL, 
    Start_Date DATETIME NOT NULL, 
    End_Date DATETIME NOT NULL, 
    Product VARCHAR(1) NULL, 
    isFirstOccurrence BIT NULL, 
    isLastOccurrence BIT NULL, 
    isFirstInstance BIT NULL, 
    isLastInstance BIT NULL 
) 

INSERT INTO #UniqueData 
    (
    Start_Date, 
    End_Date, 
    Product, 
    isFirstOccurrence, 
    isLastOccurrence, 
    isFirstInstance, 
    isLastInstance 
) 

SELECT 
    Start_Date, 
    End_Date, 
    Product, 
    isFirstOccurrence, 
    isLastOccurrence, 
    isFirstInstance, 
    isLastInstance 
FROM 
    #RawData 
WHERE 
    isFirstOccurrence = 1 OR 
    isFirstInstance = 1 OR 
    isLastInstance = 1 
ORDER BY RawData_ID, Start_Date 




-- combine the first and last occurrences in any given sequence into a single row 
SELECT 
    a.Start_Date, 
    ISNULL(b.Start_Date, a.End_Date) End_Date, 
    a.Product 
FROM 
    #UniqueData a 
    LEFT JOIN 
    #UniqueData b ON 
     b.UniqueData_ID = a.UniqueData_ID + 1 AND 
     --b.<NaturalKey> = a.<NaturalKey> AND 
     a.isLastInstance <> 1 
WHERE a.isFirstInstance = 1 or a.isFirstOccurrence = 1 
ORDER BY a.UniqueData_ID 



-- clean up 
/* 
DROP TABLE AdjacentValidity 
DROP TABLE #RawData 
DROP TABLE #UniqueData 
*/ 
+0

我試過的其他方法不會讓我保持事件的'順序',如果產品以A開頭,去B,然後返回A.如果你有一個自然鑰匙,你試圖保存,你還必須將其包含在臨時表和鏈接中 - 我已將它留在JOIN條件中(註釋掉),但是您必須記住將其添加到所有表中。 – AHiggins

2

看起來應該有一個更簡單的方法,而是一個分析查詢的組合(找不同間隙)和分層查詢(到連續的行連接)的工作原理:

with data as (
    select 'A' product, to_date('7/1/2013', 'MM/DD/YYYY') start_date, to_date('7/31/2013', 'MM/DD/YYYY') end_date from dual union all 
    select 'A' product, to_date('8/1/2013', 'MM/DD/YYYY') start_date, to_date('8/31/2013', 'MM/DD/YYYY') end_date from dual union all 
    select 'A' product, to_date('9/1/2013', 'MM/DD/YYYY') start_date, to_date('9/30/2013', 'MM/DD/YYYY') end_date from dual union all 
    select 'B' product, to_date('10/1/2013', 'MM/DD/YYYY') start_date, to_date('10/31/2013', 'MM/DD/YYYY') end_date from dual union all 
    select 'B' product, to_date('11/1/2013', 'MM/DD/YYYY') start_date, to_date('11/30/2013', 'MM/DD/YYYY') end_date from dual union all 
    select 'A' product, to_date('12/1/2013', 'MM/DD/YYYY') start_date, to_date('12/31/2013', 'MM/DD/YYYY') end_date from dual union all 
    select 'A' product, to_date('1/1/2014', 'MM/DD/YYYY') start_date, to_date('1/31/2014', 'MM/DD/YYYY') end_date from dual union all 
    select 'A' product, to_date('2/1/2014', 'MM/DD/YYYY') start_date, to_date('2/28/2014', 'MM/DD/YYYY') end_date from dual union all 
    select 'A' product, to_date('3/1/2014', 'MM/DD/YYYY') start_date, to_date('3/31/2014', 'MM/DD/YYYY') end_date from dual 
), 
start_points as 
(
    select product, start_date, end_date, prior_end+1, case when prior_end + 1 = start_date then null else 'Y' end start_point 
    from (
     select product, start_date, end_date, lag(end_date,1) over (partition by product order by end_date) prior_end 
     from data 
    ) 
) 
select product, min(start_date) start_date, max(end_date) end_date 
from (
    select product, start_date, end_date, level, connect_by_root(start_date) root_start 
    from start_points 
    start with start_point = 'Y' 
    connect by prior end_date = start_date - 1 
    and prior product = product 
) 
group by product, root_start; 



PRODUCT START_DATE END_DATE 
------- ---------- --------- 
A  01-JUL-13 30-SEP-13 
A  01-DEC-13 31-MAR-14 
B  01-OCT-13 30-NOV-13 
+0

不錯。 [撥弄](http://sqlfiddle.com/#!4/6d1e6/6)。 –