如何優化配置單元中的非平等加入？

我有兩張表，一張是a（1000行），另一張是b（7000萬行）。如何優化配置單元中的非平等加入？

有兩個場starttime，在表bendtime在表a和一個場time。

我用mapjoin查詢：

select /*+ MAPJOIN(a) */ a.starttime,a.endtime, b.time 
from a join b 
where b.time between a.starttime and a.endtime;

，但執行的速度非常非常緩慢。 mapreduce工作始終保持在0％。

你有另一種優化方法嗎？

來源

2016-07-05 Guo

一種方法是將a擴大爲每天都有一行。

另一種方法是使用交錯技術。這假設a確實劃分時間，所以沒有重疊或間隙。而且，b有一個主鍵。

所以，在b每個id就可以得到相應的起始時間a：

select id, time, a.starttime, a.endtime 
from (select id, time, max(starttime) over (order by time, priority) as a_starttime 
     from ((select b.id, b.time, null as starttime, 2j as priority from b) union all 
      (select null, a.starttime, a.starttime, 1 as priority from a) 
      ) ab 
    ) ab join 
    a 
    on ab.a_starttime = a.starttime;

注：該技術工作

select id, time, max(starttime) over (order by time, priority) as a_starttime 
from ((select b.id, b.time, null as starttime, 2j as priority from b) union all 
     (select null, a.starttime, a.starttime, 1 as priority from a) 
    ) ab;

然後你就可以用等值連接使用以及其他數據庫。我沒有機會在Hive上試用它。

來源

2016-07-05 11:00:19

謝謝你的回覆！實際上，在兩張表中有很多字段，使用交錯技術看起來很麻煩和不方便，是不是？這種情況有另一種方法嗎？ – Guo

@郭。。。不是我可以在Hive中想到的。 –

如何優化配置單元中的非平等加入？

回答

相關問題