2010-01-29 168 views
4

SQL中是否有可能刪除(只有一個)組合列的重複條目(這裏是:city,zip)?所以,如果我有這個SQL:通過SQL刪除重複條目?

INSERT INTO foo (id, city, zip) VALUES (1, 'New York', '00000') 
INSERT INTO foo (id, city, zip) VALUES (2, 'New York', '00000') 

我可以刪除第一個以後的SQL語句?我的方法不適用於此

DELETE FROM foo (id, city, zip) 
     WHERE id IN 
      (SELECT id FROM foo GROUP BY id HAVING (COUNT(zip) > 1)) 
+2

只刪除一個或只留下一個?這一點很重要,只要你有3個匹配的項目。 – Lucero 2010-01-29 11:43:48

+0

只有一個。 – codevour 2010-01-29 11:52:18

回答

6

改編自this article。這兩個解決方案是通用的,並且應該在任何合理的SQL實現上工作。

就地刪除重複:

DELETE T1 
FROM foo T1, foo T2 
WHERE (T1.city = T2.city AND foo1.zip=foo2.zip) -- Duplicate rows 
    AND T1.id > T2.id;       -- Delete the one with higher id 

簡單,並應做工精細的小表或表很少重複。

重複的記錄複製到另一個表:

CREATE TABLE foo_temp LIKE(foo); 
INSERT INTO foo_temp (SELECT distinct city, zip) FORM foo; 
TRUNCATE TABLE foo; 

如果你足夠幸運,有一個作爲你的ID,簡單地說:

INSERT INTO foo SELECT * FROM foo_temp; 
DROP TABLE foo_temp; 

有點複雜,但非常有效的有很多重複的非常大的桌子。對於這些,創建(城市,郵編)索引將令人難以置信地提高查詢性能。

+1

「正在進行中」 - 我將不得不記住在編輯時將來也會這樣做;;) – Lucero 2010-01-29 11:47:27

+0

是的。我彈出了一般想法,防止其他人浪費他們的時間用相同的想法參加比賽。 – 2010-01-29 11:59:20

1

由於不同的方言有不同的特徵,因此您的案例中支持的SQL不清楚。是什麼使我想起在內部查詢,而不是HAVING使用排名上zip,只包括那些有秩> 1

+0

SQL98將是最好的 – codevour 2010-01-29 11:53:43

2

SQL Server 2005和更高:

WITH q AS 
     (
     SELECT *, 
       ROW_NUMBER() OVER (PARTITION BY city, zip ORDER BY id) AS rn, 
       COUNT(*) OVER (PARTITION BY city, zip ORDER BY id) AS cnt 
     FROM mytable 
     ) 
DELETE 
FROM q 
WHERE rn = 1 
     AND cnt > 1 

刪除的第一行(具有一式兩份),

WITH q AS 
     (
     SELECT *, ROW_NUMBER() OVER (PARTITION BY city, zip ORDER BY id) AS rn 
     FROM mytable 
     ) 
DELETE 
FROM q 
WHERE rn = 2 

刪除第一個重複,

WITH q AS 
     (
     SELECT *, ROW_NUMBER() OVER (PARTITION BY city, zip ORDER BY id) AS rn 
     FROM mytable 
     ) 
DELETE 
FROM q 
WHERE rn > 1 

刪除所有重複項。

+0

+1 - 我的意思是我的意見,但我不夠流利,只是寫下來。 – Lucero 2010-01-29 11:53:42

1
DELETE FROM 
    cities 
WHERE 
    id 
NOT IN 
(
    SELECT id FROM 
    (
     -- Get the maximum id of any zip/city combination 
      -- This will work with both duped and non-duped rows 
     SELECT 
      MAX(id), 
      city, 
      zip 
     FROM 
      cities 
     GROUP BY 
      city, 
      zip 
    ) ids_only 
) 
0

接受的答案沒有在我的oracle數據庫上工作。 該做的:

DELETE FROM 
    mytable A 
WHERE 
    A.rowid > 
    ANY (
    SELECT 
     B.rowid 
    FROM 
     mytable B 
    WHERE 
     A.col1 = B.col1 
    AND 
     A.col2 = B.col2 
     ); 

(也適用於任何列,而不是ROWID)

找到here