2012-09-07 68 views
1

我想爲每個客戶記錄最後三次登錄日期,並找到那些在上次登錄之前(login3)和最後登錄(login1)之間的時間間隔超過4天的客戶。 。PostgreSQL查找最近三次登錄日期之間的差異

「活動」 表包含:

  • user_id說明
  • login_date在DATETIME格式,但是時間總是00:00:00
  • (和其他一些不相關的問題域)

我試了幾個查詢,但沒有一個能正常工作。

+1

什麼版本的PostgreSQL您使用的可能嗎?如果> = 8.4,你可以有一個涉及[窗口函數]的解決方案(http://www.postgresql.org/docs/current/static/tutorial-window.html)。 –

+0

我覺得它的9.1。你能否詳細解釋一下,我是一個新的Postgre用戶。 – Boris

+3

@Boris:簡稱PostgreSQL或Postgres。沒有'Postgre'這樣的東西。如果您不確定版本,請詢問服務器'SELECT version();' –

回答

3

下面是一個解決方案,可以使用PostgreSQL 8.3和更高版本使用數組。

生成測試數據。變化的generate_series()第二參數來添加更多的活動記錄:

create table activity (id serial primary key, user_id integer, login_date timestamp); 
insert into activity (user_id, login_date) 
    select * from 
    (
    select round(random()*10)::integer as user_id, ('2012-01-01'::date + (round(random()*300))* '1 day'::interval) as login_date 
    from 
    (select generate_series(1,1000)) foo 
) fooger order by login_date; 

select * from activity; 

查詢出所需要的數據:

--show last three login dates per user: 
select user_id, login[1] as login1, login[2] as login2, login[3] as login3 
from 
(
select user_id, array_agg(login_date) as login from 
(select * from activity order by user_id,login_date desc) foo 
group by user_id 
) foo; 

--shake out those who haven't been visiting frequently enough 
select user_id, login[1] as login1, login[2] as login2, login[3] as login3, (login[1] - coalesce(login[3],login[2],login[1]))::interval as diff 
from 
(
select user_id, array_agg(login_date) as login from 
(select * from activity order by user_id,login_date desc) foo 
group by user_id 
) foo 
where login[1] - coalesce(login[3],login[2],login[1]) > '4 days'::interval; 
+0

+1舊學校,但非常有效。 –

+0

謝謝,約書亞貝里,我會盡快測試它,我有「創建」和「插入」權限。 – Boris

2

予使用,並且簡化由@Joshua提供的設置:

CREATE TEMP TABLE activity (id serial primary key, user_id integer 
               , login_date timestamp); 
INSERT INTO activity (user_id, login_date) 
SELECT * FROM (
    SELECT round(random()*10)::int AS user_id 
     , ('2012-01-01 0:0'::timestamp + random() * interval '365 days') AS ts 
    FROM generate_series(1,1000) 
    ) g 
ORDER BY ts; 

您可以使用window functions,自PostgreSQL 8.4起可用:

SELECT user_id, login1, login3, (login1 - login3) AS time_span 
FROM (
    SELECT user_id, login_date 
     ,first_value(login_date)  OVER w AS login1 
     ,COALESCE(lead(login_date, 2) OVER w 
        ,lead(login_date) OVER w) AS login3 
    FROM activity 
    WINDOW w AS (PARTITION BY user_id ORDER BY login_date DESC) 
    ) x 
WHERE login_date = login1 
AND (login1 - login3) > interval '4d'; 

它更容易讀取IMO,但在一個快速測試@約書亞的查詢是~30%更快

  • 只有一個條目的用戶永遠不符合條件。
  • 對於只有兩個條目的用戶,使用倒數第二個而不是倒數第三個。

不談,如果時間戳的時間部分始終爲00:00:00您可能需要考慮使用date column instead of timestamp

+0

再次感謝Erwin。 – Boris

+0

@Boris:添加了關於數據類型的提示。 –

+0

(對於獎勵積分)恕我直言,你可以通過CTE來重置臨時表活動。 – wildplasser

0

出於完整性:樸素版(查詢計劃顯示了3 CTE的三個獨立的子計劃,這是壞的)(遞歸CTE還應該;-)

WITH l3 AS (
     SELECT a3.id, a3.user_id, a3.login_date 
     FROM activity a3 
     WHERE NOT EXISTS (SELECT * 
       FROM activity nx 
       WHERE nx.user_id = a3.user_id 
       AND nx.login_date > a3.login_date 
       ) 
     ) 
, l2 AS (
     SELECT a2.id, a2.user_id, a2.login_date 
     FROM activity a2 
     JOIN l3 ON l3.user_id = a2.user_id AND l3.login_date > a2.login_date 
     WHERE NOT EXISTS (SELECT * 
       FROM activity nx 
       WHERE nx.user_id = a2.user_id 
       AND nx.login_date > a2.login_date 
       AND nx.login_date < l3.login_date 
       ) 
     ) 
, l1 AS (
     SELECT a1.id, a1.user_id, a1.login_date 
     FROM activity a1 
     JOIN l2 ON l2.user_id = a1.user_id AND l2.login_date > a1.login_date 
     WHERE NOT EXISTS (SELECT * 
       FROM activity nx 
       WHERE nx.user_id = a1.user_id 
       AND nx.login_date > a1.login_date 
       AND nx.login_date < l2.login_date 
       ) 
     ) 
SELECT l1.user_id 
     ,l1.id AS ii1, l1.login_date AS d1 
     ,l2.id AS ii2, l2.login_date AS d2 
     ,l3.id AS ii2, l3.login_date AS d3 
FROM l1 
JOIN l2 ON l2.user_id = l1.user_id 
JOIN l3 ON l3.user_id = l1.user_id 
WHERE l3.login_date - l1.login_date > '4 days'::INTERVAL 
     ;