2017-07-19 87 views
0

給定一個連續運行數據的表:一個數字在任務進行時總是增加,並在下一個任務開始時重置爲零,如何選擇最大值每次運行的數據?連續運行數據的SQL選擇最大值

每個連續運行可以有任意數量的行和數據的運行由AA「開始」和「結束」行標,例如數據可能看起來像

user_id, action, qty, datetime 
1,  start, 0, 2017-01-01 00:00:01 
1,  record, 0, 2017-01-01 00:00:01 
1,  record, 4, 2017-01-01 00:00:02 
1,  record, 5, 2017-01-01 00:00:03 
1,  record, 6, 2017-01-01 00:00:04 
1,  end, 0, 2017-01-01 00:00:04 
1,  start, 0, 2017-01-01 00:00:05 
1,  record, 0, 2017-01-01 00:00:05 
1,  record, 2, 2017-01-01 00:00:06 
1,  record, 3, 2017-01-01 00:00:07 
1,  end, 0, 2017-01-01 00:00:07 
2,  start, 0, 2017-01-01 00:00:08 
2,  record, 0, 2017-01-01 00:00:08 
2,  record, 3, 2017-01-01 00:00:09 
2,  record, 8, 2017-01-01 00:00:10 
2,  end, 0, 2017-01-01 00:00:10 

,其結果將是每次運行的最大值:

user_id, action, qty, datetime 
1,  record, 6, 2017-01-01 00:00:04 
1,  record, 3, 2017-01-01 00:00:07 
2,  record, 8, 2017-01-01 00:00:10  

使用任何postgres sql語法(9.3)?它的某種分組,然後從每個組中選擇最大值,但我不知道如何執行分組部分。

+0

對於同一個user_id,你能有2個重疊運行(例如來自不同的會話)嗎? –

+0

Theres沒有爲單個用戶重疊,下一次運行始終在晚些時候開始。 –

回答

1

快速和骯髒的,假設運行不重疊

with bounds as (select starts.rn, starts.datetime as s, ends.datetime as e from 
(select datetime,ROW_NUMBER() OVER() as rn from runs where action = 'start' order by datetime) as starts 
    join 
(select datetime,ROW_NUMBER() OVER() as rn from runs where action = 'end' order by datetime) as ends 
on starts.rn = ends.rn) 
,with_run as (SELECT *, (select rn from bounds where s <= r.datetime and e >= r.datetime) as run 
    from runs as r) 
,max_qty as (
SELECT run,max(qty) as qty 
    from with_run 
GROUP BY run) 
SELECT s.user_id,s.action,s.qty,s.datetime from with_run as s join max_qty as f on s.run = f.run AND s.qty = f.qty; 

- 試驗數據 -

create table runs (user_id int, action text, qty int, datetime TIMESTAMP); 
insert INTO runs VALUES 
(1,  'start', 0, '2017-01-01 00:00:01') 
,(1,  'record', 0, '2017-01-01 00:00:01') 
,(1,  'record', 4, '2017-01-01 00:00:02') 
,(1,  'record', 5, '2017-01-01 00:00:03') 
,(1,  'record', 6, '2017-01-01 00:00:04') 
,(1,  'end', 0, '2017-01-01 00:00:04') 
,(1,  'start', 0, '2017-01-01 00:00:05') 
,(1,  'record', 0, '2017-01-01 00:00:05') 
,(1,  'record', 2, '2017-01-01 00:00:06') 
,(1,  'record', 3, '2017-01-01 00:00:07') 
,(1,  'end', 0, '2017-01-01 00:00:07') 
,(2,  'start', 0, '2017-01-01 00:00:08') 
,(2,  'record', 0, '2017-01-01 00:00:08') 
,(2,  'record', 3, '2017-01-01 00:00:09') 
,(2,  'record', 8, '2017-01-01 00:00:10') 
,(2,  'end', 0, '2017-01-01 00:00:10'); 

UPDATE @Oto Shavadze答案可以縮短

with lookup as (select action,lag(t.*) over(order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end) as r from runs t) 
select (r::runs).user_id 
     ,(r::runs).action 
     ,(r::runs).qty 
     ,(r::runs).datetime 
from lookup where action = 'end'; 

我認爲OP不清楚什麼是最大的考慮在結束前的記錄或運行中的最高數量。

3

如果單個用戶沒有重疊,下一次運行總是從晚些時候開始,那麼您可以使用LAG()窗口函數。

with the_table(user_id, action, qty, datetime) as (
    select 1,'start', 0, '2017-01-01 00:00:01'::timestamp union all 
    select 1,'record', 0, '2017-01-01 00:00:01'::timestamp union all 
    select 1,'record', 4, '2017-01-01 00:00:02'::timestamp union all 
    select 1,'record', 5, '2017-01-01 00:00:03'::timestamp union all 
    select 1,'record', 6, '2017-01-01 00:00:04'::timestamp union all 
    select 1,'end', 0, '2017-01-01 00:00:04'::timestamp union all 
    select 1,'start', 0, '2017-01-01 00:00:05'::timestamp union all 
    select 1,'record', 0, '2017-01-01 00:00:05'::timestamp union all 
    select 1,'record', 2, '2017-01-01 00:00:06'::timestamp union all 
    select 1,'record', 3, '2017-01-01 00:00:07'::timestamp union all 
    select 1,'end', 0, '2017-01-01 00:00:07'::timestamp union all 
    select 2,'start', 0, '2017-01-01 00:00:08'::timestamp union all 
    select 2,'record', 0, '2017-01-01 00:00:08'::timestamp union all 
    select 2,'record', 3, '2017-01-01 00:00:09'::timestamp union all 
    select 2,'record', 8, '2017-01-01 00:00:10'::timestamp union all 
    select 2,'end', 0, '2017-01-01 00:00:10'::timestamp 
) 

select n_user_id, n_action, n_qty, n_datetime from (
    select action, 
    lag(user_id) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_user_id, 
    lag(action) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_action, 
    lag(qty) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_qty, 
    lag(datetime) over(partition by user_id order by datetime, case when action = 'start' then 0 when action = 'record' then 1 else 2 end, qty) as n_datetime 
    from the_table 
)t 
where action = 'end' 

因爲有些action = record行具有相同的日期時間爲startend行,我在ORDER BY使用CASE,很明顯的是start是第一,然後是record,然後end