2017-07-26 50 views
2

我在Hive數據庫中有五個表(A,B,C,D,E),我必須根據列「id」上的邏輯來合併這些表中的數據。從Hive中的多個表中選擇增量數據

的條件是:

Select * from A 
UNION 
select * from B (except ids not in A) 
UNION 
select * from C (except ids not in A and B) 
UNION 
select * from D(except ids not in A,B and C) 
UNION 
select * from E(except ids not in A,B,C and D) 

有這個數據插入到決賽桌。

一種方法是創建一個目標表(目標)併爲其添加每個UNION階段的數據,然後使用此表與另一個UNION階段進行連接。

這將是我.hql文件的一部分:

insert into target 
(select * from A 
UNION 
select B.* from 
A 
RIGHT OUTER JOIN B 
on A.id=B.id 
where ISNULL(A.id)); 

INSERT INTO target 
select C.* from 
target 
RIGHT outer JOIN C 
ON target.id=C.id 
where ISNULL(target.id); 

INSERT INTO target 
select D.* from 
target 
RIGHT OUTER JOIN D 
ON target.id=D.id 
where ISNULL(target.id); 

INSERT INTO target 
select E.* from 
target 
RIGHT OUTER JOIN E 
ON target.id=E.id 
where ISNULL(target.id); 

是否有更好的做到這一點?我認爲無論如何,我們必須做 多個加入/查找。我很期待的最好的方法來實現這一 在

1)蜂巢與TEZ

2)火花-SQL

許多在此先感謝

回答

1

如果id在每個表格中都是唯一的,則可以使用row_number代替rank

select  * 

from  (select  * 
         ,rank() over 
         (
          partition by id 
          order by  src 
         )       as rnk 

      from  (   
            select 1 as src,* from a 
         union all select 2 as src,* from b 
         union all select 3 as src,* from c 
         union all select 4 as src,* from d 
         union all select 5 as src,* from e 
         ) t 
      ) t 

where  rnk = 1 
; 
0

我想我會嘗試這樣做的:

with ids as (
     select id, min(which) as which 
     from (select id, 1 as which from a union all 
      select id, 2 as which from b union all 
      select id, 3 as which from c union all 
      select id, 4 as which from d union all 
      select id, 5 as which from e 
      ) x 
    ) 
select a.* 
from a join ids on a.id = ids.id and ids.which = 1 
union all 
select b.* 
from b join ids on b.id = ids.id and ids.which = 2 
union all 
select c.* 
from c join ids on c.id = ids.id and ids.which = 3 
union all 
select d.* 
from d join ids on d.id = ids.id and ids.which = 4 
union all 
select e.* 
from e join ids on e.id = ids.id and ids.which = 5; 
+0

太複雜了。 –