2012-12-27 96 views
2

我有一個表:刪除冗餘記錄

+------------+------------------------------------------------------+------+-----+---------+-------+ 
| Field  | Type             | Null | Key | Default | Extra | 
+------------+------------------------------------------------------+------+-----+---------+-------+ 
| person_id1 | int(10)            | NO | MUL | 0  |  | 
| person_id2 | int(10)            | NO | MUL | 0  |  | 
| priority | smallint(5)           | NO |  | 0  |  | 
| link_type | enum('member_of_band','legal_name','performs_as','') | NO |  |   |  | 
+------------+------------------------------------------------------+------+-----+---------+-------+ 

有此表上沒有主鍵,但也有person_id1指標,並在person_id2。

的問題是 - 我們有不一致的數據,例如,這個查詢:

SELECT 
    COUNT(*) as c, person_id1, person_id2 
FROM person_person 
WHERE link_type = "member_of_band" 
GROUP BY person_id1, person_id2 
HAVING c > 1 
LIMIT 10; 

返回:

+---+------------+------------+ 
| c | person_id1 | person_id2 | 
+---+------------+------------+ 
| 2 | 50674235 | 51048792 | 
| 3 | 50674245 | 50715733 | 
| 2 | 50674283 | 50712621 | 
| 2 | 50674322 | 50714244 | 
| 2 | 50674378 | 51048804 | 
| 2 | 50674438 | 51048812 | 
| 4 | 50674442 | 50715733 | 
| 2 | 50674449 | 50716913 | 
| 2 | 50674455 | 51048803 | 
| 3 | 50674469 | 50715733 | 
+---+------------+------------+ 

有沒有辦法去除所有多餘的記錄,並留下那些確定?

所有我想出是:

DELETE person_person FROM person_person 
WHERE (person_id1, person_id2) IN (

    SELECT 
     person_id1, person_id2 
    FROM person_person 
    WHERE link_type = "member_of_band" 
    GROUP BY person_id1, person_id2 
    HAVING COUNT(*) > 1 
    LIMIT 100 

) AND link_type = "member_of_band"; 

但是,這將與雙打刪除所有記錄,我需要刪除只是增加一倍。

mysql> select * from person_person where person_id1 = 50674245 and person_id2 = 50715733; 
+------------+------------+----------+----------------+ 
| person_id1 | person_id2 | priority | link_type  | 
+------------+------------+----------+----------------+ 
| 50674245 | 50715733 |  0 | member_of_band | 
| 50674245 | 50715733 |  0 | member_of_band | 
| 50674245 | 50715733 |  0 | member_of_band | 
+------------+------------+----------+----------------+ 
+0

你想其中的「雙打」的刪除和這將你保持(假設他們有不同的'priority'值)? – eggyal

+0

其中任何一個。假設我們有: – nikita2206

+0

我不明白的冗餘是什麼。順便說一句,你可能也想看看[正火](http://en.wikipedia.org/wiki/Database_normalization)數據庫。 – JJJ

回答

4
ALTER IGNORE TABLE person_person ADD UNIQUE INDEX (person_id1, person_id2, link_type); 
+0

你是第一;) – Devart

+0

*在最起碼*,你應該包括'在指數link_type';然而,目前還不清楚,即使這樣也會符合OP的要求,因爲它只被要求刪除link_type ='member_of_band''的「雙打」。 – eggyal

+0

接着說:link_type'到索引中。我假設他想刪除所有重複項,他表示查詢只是一個例子。 – Barmar