2016-09-28 27 views
1

我有一個用戶流量表,我需要獲取新用戶相比前一天的收益/損失。只是想知道是否有更好的方法來做到這一點,而不是下面的解決方案。獲取新用戶與昨天相比

模式: -

Table Strcutre: Session_ID, session_day, user_id, product_id 

我已經試過?

SELECT session_day, 
     session_count, 
     user_count - LAG(user_count, 1) OVER (ORDER BY session_day) AS gain_loss_users 
    FROM 
    (
     SELECT session_day, 
       COUNT(session_id) AS session_count, 
       COUNT(user_id) user_count 
      FROM user_traffic 
     GROUP BY session_day 
    ) X ; 
+0

小艾固體給我... – JohnHC

+1

什麼標識一個客戶的「新」或「丟失」 - 只在基於你提出的四個表格列? – mathguy

+0

沒有其他方法來確定用戶是第一次還是返回用戶。問題中的部分「新」使我感到困惑...... – Teja

回答

1

我試圖解決「新」和「返回」人的問題。這裏是我的嘗試:

select session_day, 
     COUNT(distinct user_id) AS user_cnt, 
     count(distinct user_id) - lag(count(distinct user_id)) 
            over (order by session_day) gain, 
     count(newu) AS newu, count(returnu) AS returnu 
    from (
      select session_id, 
       session_day, 
       user_id, 
       CASE WHEN 
       count(*) over (partition by user_id ORDER BY session_day,session_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          = 1 
         THEN 1 
        END 
        AS newu, 
       CASE WHEN 
       lag(session_day,1) over (partition by user_id ORDER BY session_day,session_id) 
          <> 
          lag(session_day,1) over (order by session_day,session_id) 
         THEN 1 
       END AS returnu  
      from user_traffic u 
     ) 
    group by session_day 
    order by session_day; 

測試數據和輸出:

create table user_traffic (session_id number(6), session_day date, 
          user_id number(6), product_id number(6)); 

insert into user_traffic values ( 1, date '2016-09-07', 101, 1); 
insert into user_traffic values ( 2, date '2016-09-07', 101, 4); 
insert into user_traffic values ( 3, date '2016-09-07', 102, 1); 
insert into user_traffic values ( 4, date '2016-09-08', 101, 2); 
insert into user_traffic values ( 5, date '2016-09-08', 101, 4); 
insert into user_traffic values ( 6, date '2016-09-09', 102, 1); 
insert into user_traffic values ( 7, date '2016-09-10', 102, 1); 
insert into user_traffic values ( 8, date '2016-09-10', 103, 3); 

SESSION_DAY  CNT  GAIN  NEW RETURNS 
----------- ---------- ---------- ---------- ---------- 
2016-09-07   2      2   0 -- 101 & 102 are new 
2016-09-08   1   -1   0   0 
2016-09-09   1   0   0   1 -- 102 returned 
2016-09-10   2   1   1   0 -- 103 is new 
+0

這看起來很不錯。但想補充你的答案。無界前置和當前行之間的行。不確定您使用哪個數據庫來生成此輸出。 – Teja

0

沒有一個更好的方式,但還有一個更簡潔的方式。你可以用聚合函數混合窗口功能:

SELECT session_day, 
      COUNT(session_id) as session_count, 
      COUNT(DISTINCT user_id) as user_count, 
      (COUNT(DISTINCT user_id) - 
      LAG(COUNT(DISTINCT user_id)) OVER (ORDER BY session_day) 
     ) as gain_loss_users 
     FROM user_traffic 
    GROUP BY session_day; 

我假設你想COUNT(DISTINCT)因爲:(1)用戶可以具有在同一天和(2)的兩項罪名是相同的多個會話(如果user_idsession_id從不是NULL)。

+0

需要從LAG()刪除「PARTITION BY session_day」不應該被分區,因爲它是按順序排列的,並且查詢已經在其上分組了。如果留在滯後結果中,則在sql-sever中爲NULL – Matt

+1

@Matt。 。 。謝謝。 –

+0

如何獲得新用戶的損益數字? – Teja