2017-09-13 19 views
5

我有如下表:如何在PostgreSQL中的另一個表中生成日期範圍+計數早期日期?

links

created_at   active 
2017-08-12 15:46:01 false 
2017-08-13 15:46:01 true 
2017-08-14 15:46:01 true 
2017-08-15 15:46:01 false 

當給定一個日期範圍,我要提取的時間序列還告訴我,很多活動鏈接是如何在約會等於或大於電流小創造(滾動)日期。

輸出(日期範圍2017年8月12日 - 2017年8月17日):

day   count 
2017-08-12 0 (there are 0 active links created on 2017-08-12 and earlier) 
2017-08-13 1 (there is 1 active link created on 2017-08-13 and earlier) 
2017-08-14 2 (there are 2 active links created on 2017-08-14 and earlier) 
2017-08-15 2 ... 
2017-08-16 2 
2017-08-17 2 

我想出了生成日期以下查詢:

SELECT date_trunc('day', dd):: date 
FROM generate_series 
    ('2017-08-12'::timestamp 
    , '2017-08-17'::timestamp 
    , '1 day'::interval) dd 

但滾動計數讓我困惑,我不確定如何繼續。這可以通過窗口函數解決嗎?

回答

1

這應該是最快的:

SELECT day::date 
    , sum(ct) OVER (ORDER BY day) AS count 
FROM generate_series (timestamp '2017-08-12' 
         , timestamp '2017-08-17' 
         , interval '1 day') day 
LEFT JOIN (
    SELECT date_trunc('day', created_at) AS day, count(*) AS ct 
    FROM tbl 
    WHERE active -- fastest 
    GROUP BY 1 
    ) t USING (day) 
ORDER BY 1; 

dbfiddle here

count()只能算作非空行,所以你可以使用count(active OR NULL)。但計數最快的選項是排除與WHERE子句無關的行開始。由於無論如何我們都會加上generate_series(),這是最好的選擇。

比較:

由於generate_series()回報timestamp(不date)我用date_trunc()獲得匹配的時間戳(非常稍快)。

1

我只想用聚集和累積和 - 假設你每天至少一個:

select date_trunc('day', created_at)::date as created_date, 
     sum(active::int) as actives, 
     sum(sum(active::int)) over (date_trunc('day', created_at)) as running_actives 
from t 
group by created_date; 

你只需要生成的日期,如果你有數據孔。不過,如果你這樣做,我會建議包括where active - 你現在可以包括它,我只是想確保沒有漏洞。

+0

是的,有漏洞,有些日子不見了。所以那些日子裏,我必須記下最近一次存在的日期。 –

0

我覺得像這樣的查詢可以幫助您:

;with t as (SELECT date_trunc('day', dd):: date 
FROM generate_series 
    ('2017-08-12'::timestamp 
    , '2017-08-17'::timestamp 
    , '1 day'::interval) dd 
) 
select distinct t.date_trunc 
    , count(case when links.active = 'true' then 1 end) over (order by links.created_at) count 
from t 
left join links 
on t.date_trunc = cast(links.created_at as date) 
order by t.date_trunc; 

SQL Fiddle Demo

0

如果你在你的表中缺少天,你需要使用一個generate_series()來創建它們。由於這基本上是將前面的兩個答案放在一起,所以給予了信貸;;)

但是,在GROUP BY之後,該聯合會更好地完成,它只會每天返回一行,而不是之前,這會導致一個更大的JOIN。

WITH dailydata AS (
    SELECT 
    d::DATE, COALESCE(n,0) n 
    FROM 
    generate_series( 
     '2000-01-01'::DATE, 
     '2000-10-01'::DATE, 
     '1 DAY'::INTERVAL) d 
    LEFT JOIN 
    (SELECT created_at::DATE d, count(*) AS n 
    FROM links WHERE active 
    GROUP BY d) data 
    USING (d) 
) 
SELECT d, n, sum(n) OVER (ORDER BY d) FROM dailydata; 
0
CREATE TABLE links 
     (created_at   timestamp 
     , active boolean 
     ); 
INSERT INTO links(created_at,active)VALUES 
('2017-08-12 15:46:01', false) 
,('2017-08-13 15:46:01', true) 
,('2017-08-14 15:46:01', true) 
,('2017-08-15 15:46:01', false) 
     ; 

WITH cal AS (
     select gs AS deet 
     FROM generate_series('2017-08-11'::date,'2017-08-16'::date, '1day'::interval)gs 
     ) 
SELECT cal.deet 
     , SUM(1) FILTER (WHERE l.active =True) OVER(ORDER BY l.created_at) AS cumsum 
FROM cal 
LEFT JOIN links l ON date_trunc('days', l.created_at)= cal.deet 
ORDER BY created_at 
     ; 
1

演示

http://rextester.com/OGZV44492

SQL

SELECT date_trunc('day', dd):: date AS day, 
     (SELECT COUNT(*) FROM links 
     WHERE active = true 
      AND date(created_at) <= date_trunc('day', dd)) AS "count" 
FROM generate_series 
    ('2017-08-12'::timestamp 
    , '2017-08-17'::timestamp 
    , '1 day'::interval) dd 

說明

上述SQL執行一個簡單的子選擇來計數在links表,其日期部分小於或等於每個日期在生成的範圍的行數。

+1

我真的很喜歡這個!謝謝史蒂夫。 –

+0

我意識到這個查詢不能很好地擴展非常大的表格,所以我選擇了Erwin的答案。 –

相關問題