2013-05-11 88 views
2

嘗試獲取顯示從一個月到下一個月的保留期的基本表。所以如果上個月有人購買了東西,而且他們在下個月這樣做,那麼它就會被計算在內。如何使用SQL計算每月的保留月份

month, num_transactions, repeat_transactions, retention 
2012-02, 5, 2, 40% 
2012-03, 10, 3, 30% 
2012-04, 15, 8, 53% 

所以,如果每個人上個月買再買下一個月你有100%。

到目前爲止,我只能手動計算東西。這給了我已經看到了這兩個月的行:

select count(*) as num_repeat_buyers from 

(select distinct 
    to_char(transaction.timestamp, 'YYYY-MM') as month, 
    auth_user.email 
from 
    auth_user, 
    transaction 
where 
    auth_user.id = transaction.buyer_id and 
    to_char(transaction.timestamp, 'YYYY-MM') = '2012-03' 
) as table1, 


(select distinct 
    to_char(transaction.timestamp, 'YYYY-MM') as month, 
    auth_user.email 
from 
    auth_user, 
    transaction 
where 
    auth_user.id = transaction.buyer_id and 
    to_char(transaction.timestamp, 'YYYY-MM') = '2012-04' 
) as table2 
where table1.email = table2.email 

這是不對的,但我覺得我可以用一些Postgres的窗口功能。請記住,窗口函數不允許指定WHERE子句。你大多可以訪問以前的行與前行:在閱讀之後,從減1個月更改爲加1月份:

select month, count(*) as num_transactions, count(*) over (PARTITION BY month ORDER BY month) 
from 
    (select distinct 
     to_char(transaction.timestamp, 'YYYY-MM') as month, 
     auth_user.email 
    from 
     auth_user, 
     transaction 
    where 
     auth_user.id = transaction.buyer_id 
    order by 
     month 
    ) as transactions_by_month 
group by 
    month 
+1

'repeat_transactions'從哪裏來?你能發表表格定義嗎?或者,甚至可能是http://sqlfiddle.com上的一個小樣本 – 2013-05-11 09:19:02

+1

要獲得準確答案,此問題需要樣本數據。 – 2013-05-11 09:54:42

回答

0

這使用CASEEXISTS得到重複交易:

SELECT 
    *, 
    CASE 
     WHEN num_transactions = 0 
     THEN 0 
     ELSE round(100.0 * repeat_transactions/num_transactions, 2) 
    END AS retention 
FROM 
    (
     SELECT 
      to_char(timestamp, 'YYYY-MM') AS month, 
      count(*) AS num_transactions, 
      sum(CASE 
       WHEN EXISTS (
        SELECT 1 
        FROM transaction AS t 
        JOIN auth_user AS u 
        ON t.buyer_id = u.id 
        WHERE 
         date_trunc('month', transaction.timestamp) 
          + interval '1 month' 
          = date_trunc('month', t.timestamp) 
         AND auth_user.email = u.email 
       ) 
       THEN 1 
       ELSE 0 
      END) AS repeat_transactions 
     FROM 
      transaction 
      JOIN auth_user 
      ON transaction.buyer_id = auth_user.id 
     GROUP BY 1 
    ) AS summary 
ORDER BY 1; 

編輯這個問題再次。我現在的理解是,如果某人在2012年2月購買了某件東西,然後在2012年3月再次購買了某件東西,那麼2012 - 02年的交易將被計爲該月的保留期。

+0

順便說一句,我的解決方案比Erwin的解決方案慢得多,但我將它留在這裏,以適應那些沒有'LAG()'/'LEAD()'的較小數據庫的可憐靈魂,比如SQL Server 2008 R2。 – sayap 2013-05-12 09:48:04

5

給出下面的測試表(你應該已經提供):

CREATE TEMP TABLE transaction (buyer_id int, tstamp timestamp); 
INSERT INTO transaction VALUES 
(1,'2012-01-03 20:00') 
,(1,'2012-01-05 20:00') 
,(1,'2012-01-07 20:00') -- multiple transactions this month 
,(1,'2012-02-03 20:00') -- next month 
,(1,'2012-03-05 20:00') -- next month 
,(2,'2012-01-07 20:00') 
,(2,'2012-03-07 20:00') -- not next month 
,(3,'2012-01-07 20:00') -- just once 
,(4,'2012-02-07 20:00'); -- just once 

auth_user是不相關的問題。
使用tstamp作爲列名,因爲我不使用基類型作爲標識符。

我打算使用窗口函數lag()來識別重複購買者。爲了保持它簡短,我將聚集函數和窗口函數組合在一個查詢級別中。請記住,在集合函數後應用了窗口函數

WITH t AS (
    SELECT buyer_id 
     ,date_trunc('month', tstamp) AS month 
     ,count(*) AS item_transactions 
     ,lag(date_trunc('month', tstamp)) OVER (PARTITION BY buyer_id 
              ORDER BY date_trunc('month', tstamp)) 
      = date_trunc('month', tstamp) - interval '1 month' 
      OR NULL AS repeat_transaction 
    FROM transaction 
    WHERE tstamp >= '2012-01-01'::date 
    AND tstamp < '2012-05-01'::date -- time range of interest. 
    GROUP BY 1, 2 
    ) 
SELECT month 
     ,sum(item_transactions) AS num_trans 
     ,count(*) AS num_buyers 
     ,count(repeat_transaction) AS repeat_buyers 
     ,round(
      CASE WHEN sum(item_transactions) > 0 
      THEN count(repeat_transaction)/sum(item_transactions) * 100 
      ELSE 0 
      END, 2) AS buyer_retention 
FROM t 
GROUP BY 1 
ORDER BY 1; 

結果:

month | num_trans | num_buyers | repeat_buyers | buyer_retention_pct 
---------+-----------+------------+---------------+-------------------- 
2012-01 |   5 |   3 |    0 |    0.00 
2012-02 |   2 |   2 |    1 |    50.00 
2012-03 |   2 |   2 |    1 |    50.00 

我伸出你的問題,以提供交易的數量和買家的數量之間的差異。

OR NULLrepeat_transaction用於FALSE轉換爲NULL,因此這些值不要被count()在下一步計數。

-> SQLfiddle.

+0

不錯。在閱讀你的答案後,我在處理日期算術時發現了一個錯誤(31天!= 1個月)。我特別喜歡你如何一步完成聚合和窗口功能。從來沒有做過這樣的事情。 – sayap 2013-05-12 09:42:13

+0

以下是我如何定義2月份買家保留百分比:1月和2月買入的唯一買家數量除以1月買入的唯一買家數量。你能否檢查一下除以總數(item_transactions)而不是count(*)是否正確?謝謝。 – 2013-07-17 19:29:07

+0

即使按數字(*)除數,我認爲該數字不正確。對於2012-01,買家保留率爲1/3而不是1/2。 – 2013-07-25 17:11:39