使用PostgreSQL查詢生成具有日常統計信息的時間序列

我發現自己處於必須制定（對我而言）相當複雜的SQL查詢的位置，而且我似乎無法控制它。使用PostgreSQL查詢生成具有日常統計信息的時間序列

我有一個名爲orders的表格和一個相關表格order_state_history，它記錄了這些訂單隨時間的狀態（見下文）。

我現在需要生成一系列行 - 每天一行 - 包含當天結束時處於特定狀態的訂單數量（請參閱report）。另外我想只考慮order.type = 1的訂單。

數據駐留在PostgreSQL數據庫中。我已經找到了如何使用GENERATE_SERIES(DATE '2001-01-01', CURRENT_DATE, '1 DAY'::INTERVAL) days來生成時間序列，該序列允許我在沒有記錄狀態變化的日子生成行。

我目前的做法是加入orders，order_state_history而產生的一系列days一起，試圖篩選出所有具有DATE(order_state_history.timestamp) > DATE(days)然後first_value(order_state_history.new_state) OVER (PARTITION_BY(orders.id) ORDER BY order_state_history.timestamp DESC)某種方式得到在這一天每個訂單的最終狀態行，但這是我微小的SQL經驗拋棄了我的地方。

我只是無法用頭來解決問題。

這甚至可以在單個查詢中解決，還是我最好提供建議，以通過某種智能腳本來計算數據，該腳本每天執行一個查詢？什麼將是一個合理的方法來解決這個問題？

orders===    
id  type   
10000 1   
10001 1   
10002 2   
10003 2   
10004 1   


order_state_history===    
order_id index timestamp   new_state 
10000  1  01.01.2001 12:00 NEW 
10000  2  02.01.2001 13:00 ACTIVE 
10000  3  03.01.2001 14:00 DONE 
10001  1  02.01.2001 13:00 NEW 
10002  1  03.01.2001 14:00 NEW 
10002  2  05.01.2001 10:00 ACTIVE 
10002  3  05.01.2001 14:00 DONE 
10003  1  07.01.2001 04:00 NEW 
10004  1  05.01.2001 14:00 NEW 
10004  2  10.01.2001 17:30 DONE 


Expected result===    
date   new_orders active_orders done_orders 
01.01.2001 1    0    0 
02.01.2001 1    1    0 
03.01.2001 1    0    1 
04.01.2001 1    0    1 
05.01.2001 2    0    1 
06.01.2001 2    0    1 
07.01.2001 2    0    1 
08.01.2001 2    0    1 
09.01.2001 2    0    1 
10.01.2001 1    0    2

來源

2017-08-25 Thomas Hilbert

請檢查預期結果（爲什麼03.01有2個新訂單？），並添加下一個預期行，直到05.01至少。 – klin

我添加了所有相關的行。 03.01。有兩個新訂單，因爲在兩個02.01。和03.01。有新訂單（10001和10002）。訂單10001 **在新的狀態下保持**，因此在接下來的所有日子都會計入。計數是總計，結果行'new_orders'計算在一天結束時處於NEW狀態的所有訂單，無論其狀態是否更改。 –

但是10002是2類的，所以不應該被計算在內？ – klin

步驟1.計算狀態的累加值每個訂單，使用值NEW = 1，ACTIVE = 1，DONE = 2：

select 
    order_id, timestamp::date as day, 
    sum(case new_state when 'DONE' then 2 else 1 end) over w as state 
from order_state_history h 
join orders o on o.id = h.order_id 
where o.type = 1 
window w as (partition by order_id order by timestamp) 

order_id | day  | state 
----------+------------+------- 
    10000 | 2001-01-01 |  1 
    10000 | 2001-01-02 |  2 
    10000 | 2001-01-03 |  4 
    10001 | 2001-01-02 |  1 
    10004 | 2001-01-05 |  1 
    10004 | 2001-01-10 |  3 
(6 rows)

步驟2計算每個轉移矩陣基於狀態從步驟1的順序（2表示新建 - > ACTIVE，3表示新建 - > DONE，4種手段主動 - > DONE）：

select 
    order_id, day, state, 
    case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new, 
    case when state = 2 then 1 when state = 4 then -1 else 0 end as active, 
    case when state > 2 then 1 else 0 end as done 
from (
    select 
     order_id, timestamp::date as day, 
     sum(case new_state when 'DONE' then 2 else 1 end) over w as state 
    from order_state_history h 
    join orders o on o.id = h.order_id 
    where o.type = 1 
    window w as (partition by order_id order by timestamp) 
    ) s 

order_id | day  | state | new | active | done 
----------+------------+-------+-----+--------+------ 
    10000 | 2001-01-01 |  1 | 1 |  0 | 0 
    10000 | 2001-01-02 |  2 | -1 |  1 | 0 
    10000 | 2001-01-03 |  4 | 0 |  -1 | 1 
    10001 | 2001-01-02 |  1 | 1 |  0 | 0 
    10004 | 2001-01-05 |  1 | 1 |  0 | 0 
    10004 | 2001-01-10 |  3 | -1 |  0 | 1 
(6 rows)

步驟3.計算每個狀態的一系列的累加值天：

select distinct 
    day::date, 
    sum(new) over w as new, 
    sum(active) over w as active, 
    sum(done) over w as done 
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day 
left join (
    select 
     order_id, day, state, 
     case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new, 
     case when state = 2 then 1 when state = 4 then -1 else 0 end as active, 
     case when state > 2 then 1 else 0 end as done 
    from (
     select 
      order_id, timestamp::date as day, 
      sum(case new_state when 'DONE' then 2 else 1 end) over w as state 
     from order_state_history h 
     join orders o on o.id = h.order_id 
     where o.type = 1 
     window w as (partition by order_id order by timestamp) 
     ) s 
    ) s 
using(day) 
window w as (order by day) 
order by 1 

    day  | new | active | done 
------------+-----+--------+------ 
2001-01-01 | 1 |  0 | 0 
2001-01-02 | 1 |  1 | 0 
2001-01-03 | 1 |  0 | 1 
2001-01-04 | 1 |  0 | 1 
2001-01-05 | 2 |  0 | 1 
2001-01-06 | 2 |  0 | 1 
2001-01-07 | 2 |  0 | 1 
2001-01-08 | 2 |  0 | 1 
2001-01-09 | 2 |  0 | 1 
2001-01-10 | 1 |  0 | 2 
(10 rows)

來源

2017-08-26 01:44:28 klin

使用PostgreSQL查詢生成具有日常統計信息的時間序列

回答

相關問題