2017-09-26 163 views
1

我有下表獲取增量更新。我需要編寫一個普通的Hive查詢來合併具有相同鍵值和最新值的行。在Hive表上合併重複記錄

Key | A | B | C | Timestamp 
K1 | X | Null | Null | 2015-05-03 
K1 | Null | Y | Z | 2015-05-02 
K1 | Foo | Bar | Baz | 2015-05-01 

想要得到的:

Key | A | B | C | Timestamp 
K1 | X | Y | Z | 2015-05-03 
+0

首先想到的 - 凝聚的,但我不認爲,如果列是少,你可以嘗試爲蜂巢不會再支持CTE通話這是正確的 –

+0

創建新的CTE你必須創建一個新的桌子或修剪存儲。然後我有一些soln .. –

回答

0

使用FIRST_VALUE()函數來獲得持續不爲空值。需要對排序鍵進行排序,因爲last_value僅適用於一個排序鍵。

演示:

select distinct 
key, 
first_value(A) over (partition by Key order by concat(case when A is null then '1' else '2' end,'_',Timestamp)desc) A, 
first_value(B) over (partition by Key order by concat(case when B is null then '1' else '2' end,'_',Timestamp)desc) B, 
first_value(C) over (partition by Key order by concat(case when C is null then '1' else '2' end,'_',Timestamp)desc) C, 
max(timestamp) over(partition by key) timestamp 
from 
( ---------Replace this subquery with your table 
select 'K1' key, 'X' a, Null b, Null c, '2015-05-03' timestamp union all 
select 'K1' key, null a, 'Y'  b, 'Z' c, '2015-05-02' timestamp union all 
select 'K1' key, 'Foo' a, 'Bar' b, 'Baz' c, '2015-05-01' timestamp 
)s 
; 

輸出:

OK 
key  a  b  c  timestamp 
K1  X  Y  Z  2015-05-03