2017-10-04 36 views
0

我需要根據兩個表和基於自定義條件查找重複項。以下內容決定它是否重複,如果是,則只顯示最近的一個:SQL:根據自定義標準查找重複記錄

如果員工姓名和所有EmployeePolicy CoverageId(s)完全匹配另一個記錄,則認爲它是重複的。

--Employee Table 
EmployeeId Name Salary 
543   John 54000 
785   Alex 63000 
435   John 75000 
123   Alex 88000 
333   John 67000 

--EmployeePolicy Table 
EmployeePolicyId EmployeeId CoverageId 
1     543   8888 
2     543   7777 
3     785   5555 
4     435   8888 
5     435   7777 
6     123   4444 
7     333   8888 
8     333   7776 

例如,在示例中的重複以上如下:

EmployeeId Name Salary 
543  John 54000 
435  John 75000 

這是因爲它們是在僱員表具有匹配的名稱,以及兩者都具有唯一的EmployeePolicy表中的CoverageIds完全相同。

注意: EmployeeId 333也與Name = John不匹配,因爲他的CoverageID與其他John的CoverageIds不同。

起初,我一直試圖通過對記錄進行分組並重復計數(*)> 1來找出重複的舊式方式,但後來很快意識到它不起作用,因爲在英語中,我的標準定義了重複的SQL的CoverageIDs是不同的,所以它們不被認爲是重複的。

通過相同的協議,我想是這樣的:

-- Create a TMP table 

INSERT INTO #tmp 
SELECT * 
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId 

SELECT info.* 
FROM 
(
    SELECT 
     tmp.*, 
     ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum 
    FROM #tmp tmp 
) info 
WHERE 
    info.RowNum = 1 AND 

同樣,因爲SQL不認爲這是重複這是否不起作用。不知道如何將我的英文重複定義翻譯成重複的SQL定義。

任何幫助最受讚賞。

+0

用樣本數據填充臨時表並顯示預期結果 –

回答

3

最簡單的方法是將策略連接成一個字符串。唉,在SQL Server中很麻煩。這是一套基於集合的方法:

with ep as (
     select ep.*, count(*) over (partition by employeeid) as cnt 
     from employeepolicy ep 
    ) 
select ep.employeeid, ep2.employeeid 
from ep join 
    ep ep2 
    on ep.employeeid < ep2.employeeid and 
     ep.CoverageId = ep2.CoverageId and 
     ep.cnt = ep2.cnt 
group by ep.employeeid, ep2.employeeid, ep.cnt 
having count(*) = cnt -- all match 

這個想法是匹配不同員工的覆蓋率。一個簡單的標準是覆蓋的數量需要匹配。然後,它檢查匹配coverage的數量是實際計數。

注意:這會將員工ID對放在一行中。您可以加入僱員表以獲取附加信息。

0

我還沒有測試過T-SQL,但我相信下面應該給你你正在尋找的輸出。

;WITH CTE_Employee 
AS 
(
    SELECT  E.[Name] 
       ,E.[EmployeeId] 
       ,P.[CoverageId] 
       ,E.[Salary] 
    FROM  Employee E 
    INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId 
) 
, CTE_DuplicateCoverage 
AS 
(
    SELECT  E.[Name] 
       ,E.[CoverageId] 
    FROM  CTE_Employee E 
    GROUP BY E.[Name], E.[CoverageId] 
    HAVING  COUNT(*) > 1 
) 
SELECT  E.[EmployeeId] 
      ,E.[Name] 
      ,MAX(E.[Salary]) AS [Salary] 
FROM  CTE_Employee E 
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId] 
GROUP BY E.[EmployeeId], E.[Name] 
HAVING  COUNT(*) > 1 
ORDER BY E.[EmployeeId]