2017-08-04 138 views
0

我有下面的數據,我想爲每個ID蜂巢查詢找到最大值

ID  time 
12 10038446 201705102100 
13 10038446 201706052100 
14 10038446 201706060000 
15 10038446 201706060100 
16 10103517 201705101700 
17 10103517 201705102100 
18 10103517 201706052100 
19 10103517 201706060100 
20 10124464 201701310100 
21 10124464 201702210500 
22 10124464 201702220500 
23 10124464 201703062100 
24 10124464 201705102100 
25 10124464 201706052100 
26 10124464 201706060100 

輸出我下面

15 10038446 201706060100 
19 10103517 201706060100 
26 10124464 201706060100 
37 1019933 201706052100 

如何能期待獲得最近的間隔時間我使用Hive查詢來實現這一點?

回答

0

試試這個

select ID, time 
from 
(
    select 
    ID, 
    time, 
    row_number() over (partition by ID order by time desc) as time_rank 
    from table_name 
) x 
where time_rank = 1 
group by ID, time 

無子查詢(下蜂巢版本),臨時表是一個選項。

create table tmp_table as 
select 
    ID, 
    time, 
    row_number() over (partition by ID order by time desc) as time_rank 
from table_name; 

select ID, time 
from tmp_table 
where time_rank = 1 
group by ID, time; 

drop table tmp_table; 
+0

我使用較低的蜂巢版本它不支持子查詢,任何其他的選擇嗎? – Ganesh

+0

好吧,一種選擇是爲子查詢所創建的數據創建臨時表,然後從中進行選擇。 – Wonjin

+0

你可以請建議我這樣做 – Ganesh

0

使用簡單聚合:

select id, max(time) as time 
    from table 
group by id 
order by id; --order if necessary 

演示您的數據集:

select id, max(time) as time 
from 
table 
group by id 

OK 
10038446  201706060100 
10103517  201706060100 
10124464  201706060100 
Time taken: 30.66 seconds, Fetched: 3 row(s) 
+0

我早先試過,它沒有給出預期的結果,而是給出了完整的結果 – Ganesh

+0

您是否嘗試過max(cast(time,int))? – jd1338

+0

應該使用字符串和整數以及 – leftjoin