2013-05-02 15 views
0

發現重複我試圖找到在SQL Server中找到重複的一個更好的辦法的更快捷的方法。這花了超過20分鐘,只有300多萬條記錄運行結果之前開始顯示內SSMS結果窗口。在墜毀之前又過了22分鐘。在SQL Server

然後SSMS顯示16777216次記錄後,他將這個錯誤:

An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown. 

模式:

ENCOUNTER_NUM - numeric(22,0) 
CONCEPT_CD - varchar(50) 
PROVIDER_ID - varchar(50) 
START_DATE - datetime 
MODIFIER_CD - varchar(100) 
INSTANCE_NUM - numeric(18,0) 


SELECT 
    ROW_NUMBER() OVER (ORDER BY f1.[ENCOUNTER_NUM],f1.[CONCEPT_CD],f1.[PROVIDER_ID],f1.[START_DATE],f1.[MODIFIER_CD],f1.[INSTANCE_NUM]), 
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM] 
FROM 
    [dbo].[I2B2_OBSERVATION_FACT] f1 
    INNER JOIN [dbo].[I2B2_OBSERVATION_FACT] f2 ON 
     f1.[ENCOUNTER_NUM] = f2.[ENCOUNTER_NUM] 
     AND f1.[CONCEPT_CD] = f2.[CONCEPT_CD] 
     AND f1.[PROVIDER_ID] = f2.[PROVIDER_ID] 
     AND f1.[START_DATE] = f2.[START_DATE] 
     AND f1.[MODIFIER_CD] = f2.[MODIFIER_CD] 
     AND f1.[INSTANCE_NUM] = f2.[INSTANCE_NUM] 

回答

8

不知道快了多少,這是,但值得一試。

SELECT 
    COUNT(*) AS Dupes, 
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM] 
FROM 
    [dbo].[I2B2_OBSERVATION_FACT] f1 
GROUP BY 
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM] 
HAVING 
    COUNT(*) > 1 
+0

用合適的指數應該是相當快 – Phil 2013-05-02 17:58:49

+2

+1,但我會用'COUNT(*)',而不是'COUNT(1)'...它更簡潔明瞭,你在做什麼。計算行數。 – Matthew 2013-05-02 18:01:48

+0

+1 300米X 300M加入可不是鬧着玩的,除非你運行它作爲一個合併連接,這需要一個非常具體的指標。這應該快得多! – Andomar 2013-05-02 18:02:09