2017-08-12 19 views
0

我試圖找到誰在一個月內花費最高的一次用戶,在今年的每個月都在今年一月份用戶度過,每個月最高的一次使用時間會

我使用以下數據

uid    activity-time    status 
    ...   ...................   ........ 
    1   2016-12-31 16:00:04   sign in 
    1   2016-12-31 21:05:37   sign out 
    2   2016-12-25 18:00:04   sign in 
    2   2016-12-25 20:45:31   sign out 
    7   2016-10-31 13:00:04   sign in 
    7   2016-10-31 16:05:30   sign out 
    1   2016-12-27 17:00:04   sign in 
    1   2016-12-27 19:05:00   sign out 
    2   2016-10-25 18:00:04   sign in 
    2   2016-10-25 20:45:31   sign out 
    4   2017-12-31 16:00:04   sign in 
    4   2017-12-31 21:05:37   sign out 
    3   2017-12-25 18:00:04   sign in 
    3   2017-12-25 20:45:31   sign out 
    7   2017-10-31 16:00:04   sign in 
    7   2017-10-31 21:05:37   sign out 
    3   2017-10-25 18:00:04   sign in 
    3   2017-10-25 20:45:31   sign out 

我期待下面的輸出

uid  year month  time-spent 
......  ..... .....  .......... 
1   2016 12  07:10:45 
7   2016 10  03:05:34 
4   2017 12  05:05:41 
7   2017 10  05:05:41 

我曾嘗試下面的查詢,但我不知道如何指定的登錄和退出條件

SELECT ETS.* 
FROM (SELECT year(activity-time),month(activity-time), uid, count(uid) as c, 
ROW_NUMBER() OVER (PARTITION BY month(activity-time) ORDER BY COUNT(uid) DESC) as seq 
FROM activity_table 
GROUP BY month(activity-time),year(activity-time), uid 
) ds 
WHERE seq = 1 
ORDER BY c DESC ; 
+0

應該將跨越兩個月的時間段分開嗎? –

回答

0

您可以使用lag的嵌套查詢來獲取登錄和註銷記錄之間的時差。

我沒有hiveql,所以我可能會在一些特定的日期/時間的功能被關閉,但這個想法是這樣的:

select yr, 
     mnth, 
     uid, 
     from_unixtime(spent, 'hh:mm:ss') spent 
from (
     select year(activity_time) yr, 
       month(activity_time) mnth, 
       uid, 
       sum(spent) spent, 
       row_number() over (partition by year(activity_time), month(activity_time) 
            order by  sum(spent) desc) rn 
     from (
       select uid, 
         activity_time, 
         status, 
         unix_timestamp(activity_time) 
          - lag(unix_timestamp(activity_time)) 
           over (partition by uid order by activity_time) spent 
       from activity_table 
      ) base 
     where status = 'sign out' 
     group by year(activity_time), 
       month(activity_time), 
       uid 
    ) grouped 
where rn = 1; 

注:我建議不要在列中使用連字符。名稱,但下劃線(我在上面的SQL中做過)。

0

這是在SQL Server中,但應該給你一個想法。我首先創建了一個CTE,它將計算總時間,以便我可以使用SUM - 按ID,MM-yyyy日期分組,然後再將其轉換爲時間格式。然後一個row_number獲取每個日期的最大值。

;WITH activity_table_seconds 
    AS (SELECT [uid], 
       [activity-time], 
       (Datepart(hour, [activity-time]) * 60 * 60) + ( 
       Datepart(minute, [activity-time]) * 60) + 
       Datepart(second, [activity-time]) AS 
       [activity-time-seconds], 
       [status] 
     FROM @activity_table) 
SELECT [uid], 
     [date], 
     [activity-time] 
FROM (SELECT *, 
       Row_number() 
       OVER ( 
        partition BY [date] 
        ORDER BY [activity-time] DESC) rn 
     FROM (SELECT a.[uid], 
         Format(a.[activity-time], 'MM-yyyy') AS [date], 
         CONVERT(VARCHAR(8), 
         Dateadd(second, Sum(b.[activity-time-seconds] - 
              a.[activity-time-seconds]), 0), 
           108) AS [activity-time] 
       FROM (SELECT * 
         FROM activity_table_seconds 
         WHERE [status] = 'sign in') a 
         INNER JOIN (SELECT * 
            FROM activity_table_seconds 
            WHERE [status] = 'sign out') b 
           ON a.[uid] = b.[uid] 
            AND Cast(a.[activity-time] AS DATE) = Cast( 
             b.[activity-time] AS DATE) 
       GROUP BY a.[uid], 
          Format(a.[activity-time], 'MM-yyyy')) a) b 
WHERE b.rn = 1