刪除大表中的重複行

我有大表（〜1,000,000行），可能包含重複的值。刪除大表中的重複行

該表包含兩列（例如col a，col b），它們共同表示唯一鍵，ID和最後更新日期。

例如我可以有如下表格：

id | a | b |更新

1 | jon |史密斯| 1/1

2 | don |史密斯| 2/5

3 | bob |大衛| 1/1

4 | dan |劉易斯| 3/1

5 | bob |大衛| 3/1

正如您可以看到id 3和5那樣，該表在a列和b列中都包含相同的值。我想刪除包含這種重複的行，但保留最後一次更新的行。

對於這個例子，我將在刪除後有這張表： id | a | b |更新

1 | jon |史密斯| 1/1

2 | don |史密斯| 2/5

4 | dan |劉易斯| 3/1

5 | bob |戴維斯| 3/1

（ID = 3刪除，因爲我已經有一個擺錘=和b =戴維斯在行其中id = 5，該行中的更新是高於所述一個被刪除的行中）

來源

2011-11-29 doubleM

delete from MyTable 
where exists (
    select 1 from MyTable t2 
    where MyTable.a=t2.a and MyTable.b=t2.b and MyTable.upd<t2.upd 
)

來源

2011-11-29 18:09:08 dasblinkenlight

您需要在WHERE子句中執行兩個自引用。第一個標識重複的行，第二個將確保您沒有刪除最新的版本。

DELETE 
FROM  TestCase 
WHERE EXISTS (
    -- Where there's more than one 
    SELECT 1 
    FROM  TestCase AS Reference 
    WHERE TestCase.a = Reference.a 
     AND TestCase.b = Reference.b 
     AND TestCase.[update] <> Reference.[update] 
    ) 
    AND TestCase.[update] <> (
    -- and this isn't the most recent 
    SELECT Max (Reference2.[update]) 
    FROM  TestCase AS Reference2 
    WHERE TestCase.a = Reference2.a 
     AND TestCase.b = Reference2.b 
    GROUP BY Reference2.a, 
      Reference2.b 
    )

來源

2011-11-29 18:17:35

一個自我引用應該是足夠的，因爲最新的更新的不平等就足以阻止行從與自身匹配起來。 – dasblinkenlight

你說得對，dasblinkenlight。在這樣的大型桌面上，性能增益會很大。榮譽。 ;） –

下面一個應該工作。

DELETE FROM MYTABLE WHERE ID IN( SELECT M1.ID FROM MYTABLE M1, MYTABLE M2 WHERE M1.A = M2.A AND M1.B = M2.B AND M1.ID < M2.ID);

來源

2011-11-29 19:57:51 Teja

刪除大表中的重複行

回答

相關問題