2010-10-19 38 views
0

我使用來自多個來源的數據,這些數據來自我無法控制的數據。這些來源往往在「關鍵」值中有重複。我需要保持這些重複的值在連接中匹配。排除重複鍵匹配發生的加入結果

使用下面的數據

T1 
| ID | FirstKey | SecondKey | ThirdKey | AdditionalColumns | 
+----+----------+-----------+----------+---------------------+ 
| 01 | Prod1 | ABC1  | 201  | Jun 2010, A, 101 | 
| 02 | Prod2 | DEF2  | 202  | May 2009, A, 101 | 
| 03 | Prod2 | DEF2  | 202  | May 2010, S, 101 | 
| 04 | Prod3 |   | 206  | Jun 2010, A, 103 | 
| 05 | Prod4 |   | 207  | Jun 2011, S, 103 | 


T2 
| ID | FirstKey | SecondKey | ThirdKey | AdditionalColumns | 
+----+----------+-----------+----------+---------------------+ 
| 01 | Prod1 | ABC1  | 201  | Jun 2010, A, 101 | 
| 02 | Prod2 | DEF2  |   | May 2009, A, 101 | 
| 03 | Prod2 | DEF2  | 202  | May 2010, S, 101 | 
| 04 | Prod3 |   |   | Jun 2010, A, 103 | 
| 05 | Prod4 |   | 207  | Jun 2011, S, 103 | 
| 06 | Prod1 | ABC1  | 201  | Jun 2010, T, 101 | 

現在,如果我們做的查詢:

SELECT 
     T1.FirstKey, T1.SecondKey, T1.ThirdKey, 
     T2.FirstKey, T2.SecondKey, T2.ThirdKey, 
     T1.AdditionalColumns, T2.AdditionalColumns 
FROM 
     T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
      AND T1.SecondKey = T2.SecondKey 
      AND T1.SecondKey IS NOT NULL 
UNION 
SELECT 
     T1.FirstKey, T1.SecondKey, T1.ThirdKey, 
     T2.FirstKey, T2.SecondKey, T2.ThirdKey, 
     T1.AdditionalColumns, T2.AdditionalColumns 
FROM 
     T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
      AND T1.ThirdKey = T2.ThirdKey 
      AND T1.SecondKey IS NULL 

我們得到如下結果

FirstKey SecondKey ThirdKey FirstKey SecondKey ThirdKey AdditionalColumns AdditionalColumns 
-------- --------- -------- -------- --------- -------- ----------------- ----------------- 
Prod1  ABC1  201  Prod1  ABC1  201  Jun 2010, A, 101 Jun 2010, A, 101 
Prod1  ABC1  201  Prod1  ABC1  201  Jun 2010, A, 101 Jun 2010, T, 101 
Prod2  DEF2  202  Prod2  DEF2  202  May 2009, A, 101 May 2010, S, 101 
Prod2  DEF2  202  Prod2  DEF2  202  May 2010, S, 101 May 2010, S, 101 
Prod4  NULL  207  Prod4  NULL  207  Jun 2011, S, 103 Jun 2011, A, 103 

我需要查詢只返回記錄與權威比賽。例如表格之間只有1個匹配。

FirstKey SecondKey ThirdKey FirstKey SecondKey ThirdKey AdditionalColumns AdditionalColumns 
-------- --------- -------- -------- --------- -------- ----------------- ----------------- 
Prod4  NULL  207  Prod4  NULL  207  Jun 2011, S, 103 Jun 2011, A, 103 

有沒有辦法在JOIN中做到這一點?

目前我可以通過爲每個表格製作CTE來保證唯一性,這些表格確保了連接中使用的鍵的唯一性。這種方法很有效,但很難看,並且增加了查詢的重要工作。

是否有另一種方法來做這個連接,將排除重複的匹配?這假設我不能基於AdditionalColumns數據以編程方式排除任何重複行。

我一遍又一遍地遇到了這個問題,所以CTE方法看起來就像是kludgey,因爲它必須是一個已經解決的問題。

回答

1

如何使用GROUP BY在你的查詢:

SELECT T1.FirstKey, T1.SecondKey, T1.ThirdKey, T2.FirstKey, T2.SecondKey, T2.ThirdKey, T1.AdditionalColumns, T2.AdditionalColumns, COUNT(*) 
FROM (
SELECT 
     T1.FirstKey, T1.SecondKey, T1.ThirdKey, 
     T2.FirstKey, T2.SecondKey, T2.ThirdKey, 
     T1.AdditionalColumns, T2.AdditionalColumns 
FROM 
     T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
      AND T1.SecondKey = T2.SecondKey 
      AND T1.SecondKey IS NOT NULL 
UNION 
SELECT 
     T1.FirstKey, T1.SecondKey, T1.ThirdKey, 
     T2.FirstKey, T2.SecondKey, T2.ThirdKey, 
     T1.AdditionalColumns, T2.AdditionalColumns 
FROM 
     T1 JOIN T2 ON T1.FirstKey = T2.FirstKey 
      AND T1.ThirdKey = T2.ThirdKey 
      AND T1.SecondKey IS NULL 
) 
GROUP BY T1.FirstKey, T1.SecondKey, T1.ThirdKey, T2.FirstKey, T2.SecondKey, T2.ThirdKey, T1.AdditionalColumns, T2.AdditionalColumns 
HAVING COUNT(*) = 1; 
0

的建議。

讓你的整個選擇一個子查詢。讓我們將其命名爲SUBQ

那你就去做這樣的:

SELECT * 
FROM (SUBQ) 
GROUP BY `ThirdKey` 
HAVING COUNT(*) = 1