2017-05-30 57 views
0

我有數據的幾百萬行的表看起來像這樣:使用HiveQL,我如何拉最高的整數行?

+---------------+--------------+-------------------+ 
| page   | search_term | interactions | 
+---------------+--------------+-------------------+ 
| /mom   | pizza  |  15   | 
| /dad   | pizza  |   8   | 
| /uncle  | pizza  |   2   | 
| /brother  | pizza  |   7   | 
| /mom   | pasta  |  12   | 
| /dad   | pasta  |  23   | 
+---------------+--------------+-------------------+ 

我的目標是運行HiveQL查詢將返回最大的「互動」的編號爲每個獨特頁/學期組合。例如:

+---------------+--------------+-------------------+ 
| page   | search_term | interactions | 
+---------------+--------------+-------------------+ 
| /dad   | pasta  |  23   | 
| /mom   | pizza  |  15   | 
+---------------+--------------+-------------------+ 

我怎麼會寫這考慮到每個網頁都有幾十萬SEARCH_TERMS的,但我只是想拉一個SEARCH_TERM最互動?我曾嘗試使用max(交互)和max(struct(interact,search_term))。col1但沒有運氣。無論有多少互動,我的輸出一直給我提供每個網頁的所有search_terms。

謝謝!

回答

0

使用ROW_NUMBER()分析函數:

select page, search_term, interactions from (select page, search_term, interactions, row_number() over (partition by page order by interactions desc) rn )s where rn = 1;

相關問題