2017-08-24 38 views
1

我試圖從廠商表和vendor_address表找到使用多個字段的數據庫複製的供應商。事情是我做的內心連接越少查詢失去潛在的結果。雖然我在供應商ID中沒有重複,但我希望找到類似的潛在供應商。SQL查找與幾個字段(沒有唯一ID)複製解決

這是到目前爲止我的查詢:

SELECT 
    o.vendor_id 
    ,o.vndr_name_shrt_user 
    ,O.COUNTRY 
    ,O.VENDOR_NAME_SHORT 
    ,B.POSTAL 
    ,B.ADDRESS1 
    ,SAME_ADDRESS_NB 
    ,SAME_POSTAL_NB 
    ,OC.SAME_SHORT_NAME 
    ,oc.SAME_USER_NUM 
FROM VENDOR o 

JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID 

INNER JOIN (
    SELECT vndr_name_shrt_user, COUNT(*) AS SAME_USER_NUM 
    FROM VENDOR 
    WHERE COUNTRY = 'CANADA' 
    AND VENDOR_STATUS = 'A' 
    GROUP BY vndr_name_shrt_user 
    HAVING COUNT(*) > 1 
) oc on o.vndr_name_shrt_user = oc.vndr_name_shrt_user 

INNER JOIN (SELECT VENDOR_NAME_SHORT, COUNT(*) AS SAME_SHORT_NAME 
    FROM VENDOR 
    WHERE COUNTRY = 'CANADA' 
    AND VENDOR_STATUS = 'A' 
    GROUP BY VENDOR_NAME_SHORT 
    HAVING COUNT(*) > 1 
) oc on o.VENDOR_NAME_SHORT = oc.VENDOR_NAME_SHORT 

INNER JOIN (SELECT POSTAL, COUNT(*) AS SAME_POSTAL_NB 
    FROM vendor_addr 
    WHERE COUNTRY = 'CANADA' 
    AND COUNTRY ='CANADA' 
    AND POSTAL != ' ' 
    GROUP BY POSTAL 
    HAVING COUNT(*) > 1 
) oc on b.POSTAL = oc.POSTAL 

INNER JOIN (SELECT ADDRESS1, COUNT(*) AS SAME_ADDRESS_NB 
    FROM ps_vendor_addr 
    WHERE COUNTRY = 'CANADA' 
    AND COUNTRY ='CANADA' 
    AND ADDRESS1 != ' ' 
    GROUP BY ADDRESS1 
    HAVING COUNT(*) > 1 
) oc on b.ADDRESS1 = oc.ADDRESS1 
WHERE O.COUNTRY ='CANADA' 
    AND B.COUNTY = 'CANADA'; 
+1

你爲什麼內側連接?對不希望丟失數據的地方使用左外連接。 – kazzi

+1

請提供[MCVE]包括DDL語句的一些示例數據和這些數據的預期輸出你的表和DML語句。 – MT0

+0

謝謝你,好艱難 – DangerKev

回答

0

看來,如果你的連接是有點有趣,比一個更多的理由。首先,你必須內部連接,這將消除所有,但那些具有重複的所有跡象 - 這是一些你不想要的。此外,你似乎有相同的別名,OC,所有派生表 - 這不是真的會飛到這裏,你會不會走得很遠這一點。

而是做這種方式的,我建議你把你的基本的查詢重複每個重複標誌 - 如下(我刪除了same_address_nb和same_postal_nb領域,你就會明白爲什麼):

select 
    o.vendor_id 
    ,o.vndr_name_shrt_user 
    ,O.COUNTRY 
    ,O.VENDOR_NAME_SHORT 
    ,B.POSTAL 
    ,B.ADDRESS1 
    ,OC.SAME_SHORT_NAME 
    ,oc.SAME_USER_NUM 
from VENDOR o 
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID 
WHERE O.COUNTRY ='CANADA' 
AND B.COUNTY = 'CANADA' 
AND ... 

對於這些重複的跡象每一個,你會添加如下嵌套查詢到上面所示的橢圓 - 示例所示使用副本中vndr_name_shrt_user:

select 
    o.vendor_id 
    ,o.vndr_name_shrt_user 
    ,O.COUNTRY 
    ,O.VENDOR_NAME_SHORT 
    ,B.POSTAL 
    ,B.ADDRESS1 
    ,OC.SAME_SHORT_NAME 
    ,oc.SAME_USER_NUM 
    ,'SAME_USER_NUM' as duplicateFlag 
from VENDOR o 
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID 
WHERE O.COUNTRY ='CANADA' 
AND B.COUNTY = 'CANADA' 
AND o.vndr_name_shrt_user in 
(
    SELECT 
     vndr_name_shrt_user 
    FROM VENDOR 
    WHERE COUNTRY = o.country 
    AND VENDOR_STATUS = 'A' 
    GROUP BY vndr_name_shrt_user 
    HAVING COUNT(*) > 1 
) 

您可以UNION ALL這些查詢在一起,然後看所有的重複。

作爲一個方面說明,你在最後三個派生表曾經爲country = 'canada'檢查兩次。

UPDATE:顯示一個以上的重複標誌

select 
    o.vendor_id 
    ,o.vndr_name_shrt_user 
    ,O.COUNTRY 
    ,O.VENDOR_NAME_SHORT 
    ,B.POSTAL 
    ,B.ADDRESS1 
    ,OC.SAME_SHORT_NAME 
    ,oc.SAME_USER_NUM 
    ,'SAME_USER_NUM' as duplicateFlag 
from VENDOR o 
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID 
WHERE O.COUNTRY ='CANADA' 
AND B.COUNTY = 'CANADA' 
AND o.vndr_name_shrt_user in 
(
    SELECT 
     vndr_name_shrt_user 
    FROM VENDOR 
    WHERE COUNTRY = o.country 
    AND VENDOR_STATUS = 'A' 
    GROUP BY vndr_name_shrt_user 
    HAVING COUNT(*) > 1 
) 

UNION ALL 

select 
    o.vendor_id 
    ,o.vndr_name_shrt_user 
    ,O.COUNTRY 
    ,O.VENDOR_NAME_SHORT 
    ,B.POSTAL 
    ,B.ADDRESS1 
    ,OC.SAME_SHORT_NAME 
    ,oc.SAME_USER_NUM 
    ,'VENDOR_NAME_SHORT' as duplicateFlag 
from VENDOR o 
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID 
WHERE O.COUNTRY ='CANADA' 
AND B.COUNTY = 'CANADA' 
AND o.VENDOR_NAME_SHORT in 
(
    SELECT 
     VENDOR_NAME_SHORT 
    FROM VENDOR 
    WHERE COUNTRY = o.country 
    AND VENDOR_STATUS = 'A' 
    GROUP BY VENDOR_NAME_SHORT 
    HAVING COUNT(*) > 1 
) 
+0

由於只有一個複製的標誌使查詢完整的dupicated標誌的不是它還是我創建「SAME_USER_NUM」作爲duplicateFlag2? – DangerKev

+0

你會把不同的重複標誌的最後一列 - 我將用一個例子 – Eli

+0

更新查詢我應該刪除 OC.SAME_SHORT_NAME, oc.SAME_USER_NUM 正如我在原來的查詢創建它們+我得到太多結果錯誤 非常感謝順便說一句 – DangerKev

0

讓具有不同的屬性鏈式複製了一些有趣的數據:

CREATE TABLE data (ID, A, B, C) AS 
    SELECT 1, 1, 1, 1 FROM DUAL UNION ALL -- Related to #2 on column A 
    SELECT 2, 1, 2, 2 FROM DUAL UNION ALL -- Related to #1 on column A, #3 on B & C, #5 on C 
    SELECT 3, 2, 2, 2 FROM DUAL UNION ALL -- Related to #2 on columns B & C, #5 on C 
    SELECT 4, 3, 3, 3 FROM DUAL UNION ALL -- Related to #5 on column A 
    SELECT 5, 3, 4, 2 FROM DUAL UNION ALL -- Related to #2 and #3 on column C, #4 on A 
    SELECT 6, 5, 5, 5 FROM DUAL;   -- Unrelated 

現在,我們可以使用分析功能得到一些關係(沒有任何連接):

SELECT d.*, 
     LEAST(
     FIRST_VALUE(id) OVER (PARTITION BY a ORDER BY id), 
     FIRST_VALUE(id) OVER (PARTITION BY b ORDER BY id), 
     FIRST_VALUE(id) OVER (PARTITION BY c ORDER BY id) 
     ) AS duplicate_of 
FROM data d; 

其中給出:

ID A B C DUPLICATE_OF 
-- - - - ------------ 
1 1 1 1   1 
2 1 2 2   1 
3 2 2 2   2 
4 3 3 3   4 
5 3 4 2   2 
6 5 5 5   6 

但是,這並不拿起#4與#5這是關係到#2,然後到#1 ...

這可以用一個分層查詢發現:

SELECT id, a, b, c, 
     CONNECT_BY_ROOT(id) AS duplicate_of 
FROM data 
CONNECT BY NOCYCLE (PRIOR a = a OR PRIOR b = b OR PRIOR c = c); 

但是,這將使許多,許多重複的行(因爲它不知道從哪裏開始的層次從這樣會反過來爲選擇每行根) - 而不是你可以使用第一查詢給予分層查詢起點時IDDUPLICATE_OF值是相同的:

SELECT id, a, b, c, 
     CONNECT_BY_ROOT(id) AS duplicate_of 
FROM (
    SELECT d.*, 
     LEAST(
      FIRST_VALUE(id) OVER (PARTITION BY a ORDER BY id), 
      FIRST_VALUE(id) OVER (PARTITION BY b ORDER BY id), 
      FIRST_VALUE(id) OVER (PARTITION BY c ORDER BY id) 
     ) AS duplicate_of 
    FROM data d 
) 
START WITH id = duplicate_of 
CONNECT BY NOCYCLE (PRIOR a = a OR PRIOR b = b OR PRIOR c = c); 

其中給出:

ID A B C DUPLICATE_OF 
-- - - - ------------ 
1 1 1 1   1 
2 1 2 2   1 
3 2 2 2   1 
4 3 3 3   1 
5 3 4 2   1 
1 1 1 1   4 
2 1 2 2   4 
3 2 2 2   4 
4 3 3 3   4 
5 3 4 2   4 
6 5 5 5   6 

仍然有一些行,因爲局部極小的時發生的#4的搜索...這可以用一個簡單GROUP BY被刪除的重複:

SELECT id, a, b, c, 
     MIN(duplicate_of) AS duplicate_of 
FROM (
    SELECT id, a, b, c, 
     CONNECT_BY_ROOT(id) AS duplicate_of 
    FROM (
    SELECT d.*, 
      LEAST(
      FIRST_VALUE(id) OVER (PARTITION BY a ORDER BY id), 
      FIRST_VALUE(id) OVER (PARTITION BY b ORDER BY id), 
      FIRST_VALUE(id) OVER (PARTITION BY c ORDER BY id) 
      ) AS duplicate_of 
    FROM data d 
) 
    START WITH id = duplicate_of 
    CONNECT BY NOCYCLE (PRIOR a = a OR PRIOR b = b OR PRIOR c = c) 
) 
GROUP BY id, a, b, c; 

這給輸出:

ID A B C DUPLICATE_OF 
-- - - - ------------ 
1 1 1 1   1 
2 1 2 2   1 
3 2 2 2   1 
4 3 3 3   1 
5 3 4 2   1 
6 5 5 5   6 
+0

試圖解決現在 非常感謝 – DangerKev

+0

過程花費了大量的時間 – DangerKev

+0

SELECT VENDOR_ID,VENDOR_NAME_SHORT,VNDR_NAME_SHRT_USR,NAME1, MIN(duplicate_of)AS duplicate_of FROM( SELECT VENDOR_ID,VENDOR_NAME_SHORT,VNDR_NAME_SHRT_USR,NAME1, CONNECT_BY_ROOT(VENDOR_ID )AS duplicate_of FROM(SELECT D. *, – DangerKev