2017-10-20 96 views
0

兩個表table1中和表2蜂巢內連接錯誤的結果

hive> select * from table1 where dt=20171020; 
OK 
a 1 1 p 10 20171020 
b 2 2 q 10 20171020 
c 3 3 r 10 20171020 
d 4 4 r 10 20171020 

hive> select * from table2 where dt=20171020; 
OK 
a 1 1 p 10 20171020 
b 2 2 t 10 20171020 
c 3 3 r 10 20171020 

hive> select * from table1 t1 
    > join table2 t2 
    > on t1.c1=t2.c1 
    > where 
    > t1.dt=20171020 and t2.dt=20171020 and 
    > t1.c2 <> t2.c2 or t1.c3 <> t2.c3 or t1.c4 <> t2.c4 or t1.c5 <> t2.c5; 

Result: 
a 1 1 p 20 20171016 a 1 1 p 10 20171015 
a 1 1 p 20 20171016 a 1 1 p 10 20171020 
b 2 2 q 20 20171016 b 2 2 t 10 20171015 
b 2 2 q 20 20171016 b 2 2 t 10 20171020 
c 3 3 r 20 20171016 c 3 3 r 10 20171015 
c 3 3 r 20 20171016 c 3 3 r 10 20171020 
b 2 2 q 10 20171020 b 2 2 t 10 20171015 
b 2 2 q 10 20171020 b 2 2 t 10 20171020 
a 19 19 p 20 20171019 a 1 1 p 10 20171015 
a 19 19 p 20 20171019 a 1 1 p 10 20171020 

,因爲此行得到了改變,我想下面的一行,如何hive連接在上面的代碼?

b 2 2 q 10 20171020 

回答

0

嘗試this.Your加入應該是最新。

SELECT * 
FROM table1 t1 
     JOIN table2 t2 
     ON t1.c1 = t2.c1 
      AND t1.dt = t2.dt 
WHERE t1.dt = 20171020 
     AND (t1.c2 <> t2.c2 
       OR t1.c3 <> t2.c3 
       OR t1.c4 <> t2.c4 
       OR t1.c5 <> t2.c5); 
+0

我的邏輯錯誤是什麼? – rajs

+0

在實際中,我們必須使用不同的日期liket1.dt = 20171019和t2.dt = 20171020。在條件下添加dt有什麼用處? – rajs

+0

如果您不在加入中添加dt,它將對所有日期應用OR條件。同時檢查OR的括號。對於不同的日期,你可以排除日期,但是有或者在parrntheses –