2014-11-20 65 views
0

我有以下幾點:重新激活SQL

with t as (
     SELECT advertisable, EXTRACT(YEAR from day) as yy, EXTRACT(MONTH from day) as mon, 
      ROUND(SUM(cost)/1e6) as val 
     FROM adcube dac 
     WHERE advertisable IN (SELECT advertisable 
           FROM adcube dac 
           GROUP BY advertisable 
           HAVING SUM(cost)/1e6 > 100 
           ) 
     GROUP BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day) 
    ) 
select advertisable, min(yy * 10000 + mon) as yyyymm 
from (select t.*, 
      (row_number() over (partition by advertisable order by yy, mon) - 
       row_number() over (partition by advertisable, val order by yy, mon) 
      ) as grp 
     from t 
    )as foo 
group by advertisable, grp, val 
having count(*) >= 6 and val = 0 
; 

這會跟蹤站花了4個月的帳戶的激活日期。不過,我想要追蹤重新激活日期。因此,如果帳戶在4個月後再次開始支出,我可以看到該帳戶的新開始日期?

+0

請,*總是*爲您正在使用的表的表定義。還有你的Postgres版本。 – 2014-11-20 18:10:06

回答

1

您想查找val > 0以及有4個(或6個)前面0記錄的帳戶。

這裏有一個想法:

  • 計算類似值的組爲您的查詢。
  • 爲每個組分配一個序號(val_seqnum)。
  • 然後拉出每個記錄的前一個值和序列號。

現在,你想要的記錄,其中符合下列條件:

  • val > 0
  • prev_val = 0
  • 以前val_seqnum >= 4(或任何你的閾值)。

以下查詢應該這樣做(假設t含義相同):

select t.* 
from (select t.* , 
      lag(val) over (partition by advertisable order by yy, mon) prev_val, 
      lag(val_seqnum) over (partition by advertisable order by yy, mon) as prev_val_seqnum 
     from (select t.*, 
        row_number() over (partition by advertisable, val, grp order by yy, mon) as val_seqnum 
       ) as grp 
      from (select t.*, 
         (row_number() over (partition by advertisable order by yy, mon) - 
          row_number() over (partition by advertisable, val order by yy, mon) 
         ) as grp 
        from t 
       ) t 
      ) t 
    ) t 
where val > 0 and prev_val = 0 and prev_val_seqnum >= 4; 
1

我認爲這是可以根本上簡單(快):

SELECT advertisable, ym AS reactivation_ym 
FROM (
    SELECT advertisable 
     , date_trunc('month', day) AS ym 
     , SUM(cost) < 500000  AS asleep 
     , count(SUM(cost) < 500000 OR NULL) 
       OVER (PARTITION BY advertisable 
         ORDER BY date_trunc('month', day) 
         ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) AS ct 
    FROM adcube dac 
    JOIN (
     SELECT advertisable 
     FROM adcube 
     GROUP BY 1 
     HAVING SUM(cost) > 1e8 -- really 10000000 ? 
    ) x USING (advertisable) 
    GROUP BY 1, 2 
    ) sub 
WHERE NOT asleep 
AND ct = 4; 

大廈根據一些假設來填補缺失的信息。
我基本上解開了你的計算,簡化了代碼,使它比你的原始代碼更短更快。

  • 計算每個advertisable多少的最後4個月總共有cost低於50萬。只有低於閾值的所有4個(現有)個月,該行資格。 (如果您沒有爲所有月份行,你需要決定如何處理缺失行。信息不可用在你的問題。)

使用count()與定製框架窗口聚合函數。下面是最近相關答案有詳細的解釋:

你怎麼能 「鳥巢」 count()sum()
它們並不真正嵌套。這是一個聚合函數的窗口函數。詳細信息:

+0

感謝這是更快,所以當你說'行間4先行和1先行'這是檢查4個月沒有花,然後1個月後,看看是否仍然0? – user3207341 2014-11-21 10:38:38

+0

@ user3207341:實際上,檢查* current *行是否再次花費。修訂。 – 2014-11-21 13:50:41