2016-12-07 130 views
2

我的表中有3列。而且我想要計算每個用戶名的時間順序,value等於B連續多少次。類似於具有相同值的最長子列表。例如,下面計算在Hive/SQL中連續出現值的次數

time userid value 2016-01-01 1 A 2016-01-02 1 B 2016-01-03 1 B 2016-01-04 2 C 2016-01-05 2 B 2016-01-06 2 B 2016-01-07 2 B 2016-01-08 2 C 2016-01-09 2 B

數據將返回

userid times 1 2 2 3

這甚至可能沒有蜂巢用戶自定義函數?我已經挖掘了一點LAGLEAD,但找不到方法。 :(

回答

1
select  value 
      ,userid    
      ,max (times) as times 


from  (select  value 
         ,userid 
         ,count (*) as times 

      from  (select value 
           ,userid 

           ,row_number() over 
           (
            partition by userid  
            order by  time 
           ) as rn 

           ,row_number() over 
           (
            partition by userid,value 
            order by  time 
           ) as rn_val 

         from t 

        -- where value = 'B' 
         ) t 

      group by value 
         ,userid 
         ,rn - rn_val 
      ) t 

group by value 
      ,userid 

order by value 
      ,userid 
;