我有一個關於最佳方法的問題。我不確定哪種方法最適合數據被視爲可變大小。我應該使用SQL JOIN還是IN子句?
考慮以下3個表:
EMPLOYEE
EMPLOYEE_ID,EMP_NAME
PROJECT
PROJECT_ID,PROJ_NAME
EMP_PROJ(多對多的以上兩個表)
EMPLOYEE_ID,PROJECT_ID
問題:給定一個僱員,發現該員工與相關聯的所有項目的所有員工。
我已經試過這兩種方法..無論使用什麼樣的數據大小,兩種方法的差異只有幾毫秒。
SELECT EMP_NAME FROM EMPLOYEE
WHERE EMPLOYEE_ID IN (
SELECT EMPLOYEE_ID FROM EMP_PROJ
WHERE PROJECT_ID IN (
SELECT PROJECT_ID FROM EMP_PROJ p, EMPLOYEE e
WHERE p.EMPLOYEE_ID = E.EMPLOYEE_ID
AND E.EMPLOYEE_ID = 123)
去
select c.EMP_NAME FROM
(SELECT PROJECT_ID FROM EMP_PROJ
WHERE EMPLOYEE_ID = 123) a
JOIN
EMP_PROJ b
ON a.PROJECT_ID = b.PROJECT_ID
JOIN
EMPLOYEE c
ON b.EMPLOYEE_ID = c.EMPLOYEE_ID
截至目前,我預計5000名左右的員工和項目各..但沒有什麼存在還挺多對多關係的想法。 你會推薦哪種方法? 謝謝!
編輯:方針的 執行計劃1
"Hash Join (cost=86.55..106.11 rows=200 width=98)"
" Hash Cond: (employee.employee_id = emp_proj.employee_id)"
" -> Seq Scan on employee (cost=0.00..16.10 rows=610 width=102)"
" -> Hash (cost=85.07..85.07 rows=118 width=4)"
" -> HashAggregate (cost=83.89..85.07 rows=118 width=4)"
" -> Hash Semi Join (cost=45.27..83.60 rows=118 width=4)"
" Hash Cond: (emp_proj.project_id = p.project_id)"
" -> Seq Scan on emp_proj (cost=0.00..31.40 rows=2140 width=8)"
" -> Hash (cost=45.13..45.13 rows=11 width=4)"
" -> Nested Loop (cost=0.00..45.13 rows=11 width=4)"
" -> Index Scan using employee_pkey on employee e (cost=0.00..8.27 rows=1 width=4)"
" Index Cond: (employee_id = 123)"
" -> Seq Scan on emp_proj p (cost=0.00..36.75 rows=11 width=8)"
" Filter: (p.employee_id = 123)"
方法2的執行計劃:
"Nested Loop (cost=60.61..112.29 rows=118 width=98)"
" -> Index Scan using employee_pkey on employee e (cost=0.00..8.27 rows=1 width=4)"
" Index Cond: (employee_id = 123)"
" -> Hash Join (cost=60.61..102.84 rows=118 width=102)"
" Hash Cond: (b.employee_id = c.employee_id)"
" -> Hash Join (cost=36.89..77.49 rows=118 width=8)"
" Hash Cond: (b.project_id = p.project_id)"
" -> Seq Scan on emp_proj b (cost=0.00..31.40 rows=2140 width=8)"
" -> Hash (cost=36.75..36.75 rows=11 width=8)"
" -> Seq Scan on emp_proj p (cost=0.00..36.75 rows=11 width=8)"
" Filter: (employee_id = 123)"
" -> Hash (cost=16.10..16.10 rows=610 width=102)"
" -> Seq Scan on employee c (cost=0.00..16.10 rows=610 width=102)"
所以看起來就像方法二的執行計劃是稍微好一點,因爲 '成本'是60而不是方法1的85.這是分析這個問題的正確方法嗎?
即使對於各種各樣的許多組合,人們如何知道它會保持真實?
通過查看2個備選方案的執行計劃可以更快地執行該方法。 – Icarus
是的,我知道,但我永遠無法理解執行計劃。 – rk2010
在此發佈執行計劃;如果你想識別你的瓶頸,這是最有用的信息 – Icarus