2017-05-03 88 views
-1

我有HIVE表(詳情如下):問題與HIVE與ROW_NUMBER()OVER()語法

hive> select * from abcd ; 
OK 
a 1 1 
b 2 2 
a 3 3 
Time taken: 0.261 seconds, Fetched: 3 row(s) 
hive> desc abcd; 
OK 
val001     string          
val002     int           
val003     int           
Time taken: 0.084 seconds, Fetched: 3 row(s) 

我寫下面的查詢,但收到以下錯誤:

select max(rnk) rnk, max(val) val, sum(cnt) cnt from (select val, count(*) cnt, row_number() over (order by case val when null then 0 else count(*) end desc, val) rnk from (select VAL001 val from abcd) group by val) group by case when rnk <= 100 or val is null then rnk else 100 + 1 end; 

FAILED: ParseException line 3:55 missing) at 'by' near 'by' 
line 3:58 missing EOF at 'val' near 'by' 

我要找對於以上查詢結果如下:

RNK VAL    CNT 
--- ------------------------------ --- 
1 a     2 
2 b     1 

我能夠通過Oracle數據庫實現相同種類的選項卡樂。唯一的區別是我不是通過Oracle DB中的解碼順序來使用順序,而是因爲在HIVe中不支持解碼,所以我不能這樣做。

請發現這是工作的Oracle數據庫SQL查詢:

SQL> select max(rnk) rnk, max(val) val, sum(cnt) cnt from 
    (select val, count(*) cnt, row_number() over (order by 
    decode(val,null,0,count(*)) desc, val) rnk from (select VAL001 val from 
    table_name) group by val) 
    group by case when rnk <= 100 or val is null then rnk else 100 + 1 end; 

RNK VAL    CNT 
--- ------------------------------ --- 
1 a      2 
2 b      1 

誰能幫我固定HIVE查詢。讓我知道你是否需要更多細節。

回答

1

這是你的查詢。我懷疑還有一個更簡單的方式來獲得你想要的東西:

select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt 
from (select val, count(*) as cnt, 
      row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk 
     from (select VAL001 val from abcd) 
     group by val 
    ) 
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end; 

我想你只需要from子句中的子查詢表的別名:

select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt 
from (select val, count(*) as cnt, 
      row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk 
     from (select VAL001 val from abcd 
      ) x 
     group by val 
    ) x 
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end; 
+0

這有助於..非常感謝:)...你對這個查詢的簡單版本也有建議。 – HiveRLearner

+0

您的意思是單次查詢以實現結果?這將帶來額外的榮譽:) –

0

這不是技術上簡單的解決方案,但可能更容易閱讀:

第一子查詢進行計數和排名,

第二子查詢中的分類top 1 - top 100和特殊類別other (top)unknown

最終查詢進行分組。

with cnt as (
select VAL001 val, 
    count(*) as cnt, 
    row_number() over (order by decode(VAL001,null,0,count(*)) desc, VAL001) as rnk 
from abcd 
group by VAL001), 
ctg as (
select 
    val, cnt, rnk, 
    case when val is NULL then 'unknown' 
     when rnk <= 100 then 'top '||rnk 
     else 'other' end as category_code 
from cnt) 
select 
    max(rnk) as rnk, max(val) as val, sum(cnt) as cnt 
from ctg 
group by category_code 
order by 1