我想爲每個客戶記錄最後三次登錄日期,並找到那些在上次登錄之前(login3)和最後登錄(login1)之間的時間間隔超過4天的客戶。 。PostgreSQL查找最近三次登錄日期之間的差異
「活動」 表包含:
- user_id說明
- login_date在DATETIME格式,但是時間總是00:00:00
- (和其他一些不相關的問題域)
我試了幾個查詢,但沒有一個能正常工作。
我想爲每個客戶記錄最後三次登錄日期,並找到那些在上次登錄之前(login3)和最後登錄(login1)之間的時間間隔超過4天的客戶。 。PostgreSQL查找最近三次登錄日期之間的差異
「活動」 表包含:
我試了幾個查詢,但沒有一個能正常工作。
下面是一個解決方案,可以使用PostgreSQL 8.3和更高版本使用數組。
生成測試數據。變化的generate_series()
第二參數來添加更多的活動記錄:
create table activity (id serial primary key, user_id integer, login_date timestamp);
insert into activity (user_id, login_date)
select * from
(
select round(random()*10)::integer as user_id, ('2012-01-01'::date + (round(random()*300))* '1 day'::interval) as login_date
from
(select generate_series(1,1000)) foo
) fooger order by login_date;
select * from activity;
查詢出所需要的數據:
--show last three login dates per user:
select user_id, login[1] as login1, login[2] as login2, login[3] as login3
from
(
select user_id, array_agg(login_date) as login from
(select * from activity order by user_id,login_date desc) foo
group by user_id
) foo;
--shake out those who haven't been visiting frequently enough
select user_id, login[1] as login1, login[2] as login2, login[3] as login3, (login[1] - coalesce(login[3],login[2],login[1]))::interval as diff
from
(
select user_id, array_agg(login_date) as login from
(select * from activity order by user_id,login_date desc) foo
group by user_id
) foo
where login[1] - coalesce(login[3],login[2],login[1]) > '4 days'::interval;
+1舊學校,但非常有效。 –
謝謝,約書亞貝里,我會盡快測試它,我有「創建」和「插入」權限。 – Boris
予使用,並且簡化由@Joshua提供的設置:
CREATE TEMP TABLE activity (id serial primary key, user_id integer
, login_date timestamp);
INSERT INTO activity (user_id, login_date)
SELECT * FROM (
SELECT round(random()*10)::int AS user_id
, ('2012-01-01 0:0'::timestamp + random() * interval '365 days') AS ts
FROM generate_series(1,1000)
) g
ORDER BY ts;
您可以使用window functions,自PostgreSQL 8.4起可用:
SELECT user_id, login1, login3, (login1 - login3) AS time_span
FROM (
SELECT user_id, login_date
,first_value(login_date) OVER w AS login1
,COALESCE(lead(login_date, 2) OVER w
,lead(login_date) OVER w) AS login3
FROM activity
WINDOW w AS (PARTITION BY user_id ORDER BY login_date DESC)
) x
WHERE login_date = login1
AND (login1 - login3) > interval '4d';
它更容易讀取IMO,但在一個快速測試@約書亞的查詢是~30%更快。
不談,如果時間戳的時間部分始終爲00:00:00
您可能需要考慮使用date
column instead of timestamp
。
出於完整性:樸素版(查詢計劃顯示了3 CTE的三個獨立的子計劃,這是壞的)(遞歸CTE還應該;-)
WITH l3 AS (
SELECT a3.id, a3.user_id, a3.login_date
FROM activity a3
WHERE NOT EXISTS (SELECT *
FROM activity nx
WHERE nx.user_id = a3.user_id
AND nx.login_date > a3.login_date
)
)
, l2 AS (
SELECT a2.id, a2.user_id, a2.login_date
FROM activity a2
JOIN l3 ON l3.user_id = a2.user_id AND l3.login_date > a2.login_date
WHERE NOT EXISTS (SELECT *
FROM activity nx
WHERE nx.user_id = a2.user_id
AND nx.login_date > a2.login_date
AND nx.login_date < l3.login_date
)
)
, l1 AS (
SELECT a1.id, a1.user_id, a1.login_date
FROM activity a1
JOIN l2 ON l2.user_id = a1.user_id AND l2.login_date > a1.login_date
WHERE NOT EXISTS (SELECT *
FROM activity nx
WHERE nx.user_id = a1.user_id
AND nx.login_date > a1.login_date
AND nx.login_date < l2.login_date
)
)
SELECT l1.user_id
,l1.id AS ii1, l1.login_date AS d1
,l2.id AS ii2, l2.login_date AS d2
,l3.id AS ii2, l3.login_date AS d3
FROM l1
JOIN l2 ON l2.user_id = l1.user_id
JOIN l3 ON l3.user_id = l1.user_id
WHERE l3.login_date - l1.login_date > '4 days'::INTERVAL
;
什麼版本的PostgreSQL您使用的可能嗎?如果> = 8.4,你可以有一個涉及[窗口函數]的解決方案(http://www.postgresql.org/docs/current/static/tutorial-window.html)。 –
我覺得它的9.1。你能否詳細解釋一下,我是一個新的Postgre用戶。 – Boris
@Boris:簡稱PostgreSQL或Postgres。沒有'Postgre'這樣的東西。如果您不確定版本,請詢問服務器'SELECT version();' –