2011-05-07 129 views
2

我在表中有唯一的鍵ID鍵,但我有一個重複值的列?我該如何擺脫這些,而僅保留其中的像這樣的:從表中刪除重複的行

重複的記錄:

id | name | surname | 
1 | test | one  | 
2 | test | two  | 
3 | test3 | three | 
4 | test7 | four | 
5 | test | five | 
6 | test11 | eleven | 

沒有重複:

id | name | surname | 
1 | test | one  | 
3 | test3 | three | 
4 | test7 | four | 
6 | test11 | eleven | 

我GOOGLE了這一點,但它似乎沒有要工作:

DELETE ct1 
FROM mytable ct1 
     , mytable ct2 
WHERE ct1.name = ct2.name 
     AND ct1.id < ct2.id 

ERROR: syntax error at or near "ct1" 
LINE 1: DELETE ct1 
       ^

********** Error ********** 

我正在使用postgres數據庫。

+0

當你清理完數據後,你可能需要在「name」上加上一個UNIQUE約束。 – 2011-05-08 03:18:33

回答

3

你可以試試這個運行多次

delete from mytable where id in (
    select max(id) 
     from mytable 
    group by name 
    having count(1) > 1 
); 

多次等於你在name列有重複的最大數量。

否則,你可以嘗試這種更復雜的查詢:

delete from mytable where id in (
    select id from mytable 
    except 
    (
    select min(id) 
     from mytable 
    group by name 
    having count(1) > 1 
    union all 
    select min(id) 
     from mytable 
    group by name 
    having count(1) = 1 
    ) 
); 

運行此查詢一次只應刪除所有你需要的。雖然沒有嘗試過,但是...

+0

複雜的查詢工作,甚至沒有嘗試它的偉大工作 – 2011-05-07 13:05:07

+2

很高興幫助。對於像這樣的複雜分組,我建議您學習'窗口函數',例如'Rank' @Dalen在其他答案中提示。他們值得學習。 – 2011-05-07 13:07:45

3

使用Rank,實際上我對語法並不完全確定,因爲我對PostgreSQL並不擅長,這只是一個提示而已(任何人的更正都將不勝感激):

DELETE FROM mytable 
WHERE id NOT IN 
(
    SELECT x.id FROM 
    (
     SELECT id, RANK() OVER (PARTITION BY name ORDER BY id ASC) AS r 
     FROM mytable 
    ) x 
    WHERE x.r = 1 
)