2017-10-19 129 views
1

join documentation爲蜂巢鼓勵使用隱式連接,即蜂房的隱式連接總是內連接嗎?

SELECT * 
FROM table1 t1, table2 t2, table3 t3 
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535'; 

這是相當於

SELECT t1.*, t2.*, t3.* 
FROM table1 t1 
INNER JOIN table2 t2 ON 
    t1.id = t2.id 
INNER JOIN table3 t3 ON 
    t2.id = t3.id 
WHERE t1.zipcode = '02535' 

,或者將上述收益的其他記錄?

回答

2

並不總是如此。您的查詢是相同的。但沒有WHERE t1.id = t2.id AND t2.id = t3.id它將是CROSS JOIN

更新:

這是個有趣的問題,我決定添加一些演示。我們來創建兩個表格:

A(c1 int, c2 string)B(c1 int, c2 string)

負載數據:

insert into table A 
select 1, 'row one' union all 
select 2, 'row two'; 

insert into table B 
select 1, 'row one' union all 
select 3, 'row three'; 

檢查數據:

hive> select * from A; 
OK 
1  row one 
2  row two 
Time taken: 1.29 seconds, Fetched: 2 row(s) 
hive> select * from B; 
OK 
1  row one 
3  row three 
Time taken: 0.091 seconds, Fetched: 2 row(s) 

檢查交叉連接(不where隱式連接變換交叉):

hive> select a.c1, a.c2, b.c1, b.c2 from a,b; 
Warning: Map Join MAPJOIN[14][bigTable=a] in task 'Stage-3:MAPRED' is a cross product 
Warning: Map Join MAPJOIN[22][bigTable=b] in task 'Stage-4:MAPRED' is a cross product 
Warning: Shuffle Join JOIN[4][tables = [a, b]] in Stage 'Stage-1:MAPRED' is a cross product 

OK 
1  row one 1  row one 
2  row two 1  row one 
1  row one 3  row three 
2  row two 3  row three 
Time taken: 54.804 seconds, Fetched: 4 row(s) 

檢查內連接(與where作品INNER隱式連接):

hive> select a.c1, a.c2, b.c1, b.c2 from a,b where a.c1=b.c1; 
OK 
1  row one 1  row one 
Time taken: 38.413 seconds, Fetched: 1 row(s) 

嘗試執行左連接加入OR b.c1 is null到其中:

hive> select a.c1, a.c2, b.c1, b.c2 from a,b where (a.c1=b.c1) OR (b.c1 is null); 
OK 
1  row one 1  row one 
Time taken: 57.317 seconds, Fetched: 1 row(s) 

,你可以看到我們再次獲得了內部連接。 or b.c1 is null被忽略

現在left join沒有whereON條款(轉化爲CROSS):

select a.c1, a.c2, b.c1, b.c2 from a left join b; 
OK 
1  row one 1  row one 
1  row one 3  row three 
2  row two 1  row one 
2  row two 3  row three 
Time taken: 37.104 seconds, Fetched: 4 row(s) 

正如你可以看到我們有再次交叉。

嘗試左側與where條款和不ON加入(可以作爲內部):

select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1; 
OK 
1  row one 1  row one 
Time taken: 40.617 seconds, Fetched: 1 row(s) 

我們得到了INNER JOIN

儘量左連接與where條款和不ON +嘗試允許空值:

select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1 or b.c1 is null; 
OK 
1  row one 1  row one 
Time taken: 53.873 seconds, Fetched: 1 row(s) 

再次得到INNER。或者b.c1 is null被忽略。

LEFT JOIN與on條款:

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1; 
OK 
1  row one 1  row one 
2  row two NULL NULL 
Time taken: 48.626 seconds, Fetched: 2 row(s) 

是的,這是真正的左連接。

LEFT JOIN與on + where(有內部):

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1; 
OK 
1  row one 1  row one 
Time taken: 49.54 seconds, Fetched: 1 row(s) 

我們得到了INNER因爲WHERE不允許使用NULL。

LEFT JOIN與在+允許空值:

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1 or b.c1 is null; 
OK 
1  row one 1  row one 
2  row two NULL NULL 
Time taken: 55.951 seconds, Fetched: 2 row(s) 

是的,這是留給加盟。

結論:

  1. 隱加入作品中腸子(與位置)或CROSS如果沒有WHERE子句 。
  2. 如果沒有ON和沒有WHERE,左連接可以作爲CROSS工作,如果WHERE子句不允許空值 表 表,則左連接可以作爲CROSS工作。
  3. 更好地使用ANSI語法,因爲它是自我解釋,這是很容易理解你希望它像什麼工作要做。作爲INNER或CROSS工作的隱式連接或左連接很難理解,並且很容易出錯。
+0

我終於回來了這是我得到更復雜的查詢。非常酷的行爲文件。 –

+0

這些複雜的演示是很清楚知道如何寫內連接和左連接。謝謝一堆! – Xiao