如何清理我的join_table並刪除重複的條目？

我有2個模型 - Question和Tag - 它們之間有一個HABTM，它們共享一個連接表questions_tags。如何清理我的join_table並刪除重複的條目？

盛宴你的眼睛在這個BADBOY：

1.9.3p392 :011 > Question.count 
    (852.1ms) SELECT COUNT(*) FROM "questions" 
=> 417 
1.9.3p392 :012 > Tag.count 
    (197.8ms) SELECT COUNT(*) FROM "tags" 
=> 601 
1.9.3p392 :013 > Question.connection.execute("select count(*) from questions_tags").first["count"].to_i 
    (648978.7ms) select count(*) from questions_tags 
=> 39919778

我假設questions_tags連接表中包含了一堆重複的記錄 - 否則，我不知道爲什麼它會如此之大。

如何清理連接表，以便它只有uniq內容？或者我該如何檢查是否有重複的記錄？

編輯1

我使用PostgreSQL的，這是對join_table questions_tags

create_table "questions_tags", :id => false, :force => true do |t| 
    t.integer "question_id" 
    t.integer "tag_id" 
    end 

    add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id" 
    add_index "questions_tags", ["tag_id"], :name => "index_questions_tags_on_tag_id"

來源

2013-03-12 marcamillion

我把這個添加爲一個新的答案，因爲它與我的最後一個很大的不同。這個不認爲你在連接表上有一個id列。這將創建一個新表，選擇唯一的行，然後刪除舊錶並重命名新表。這比任何涉及子選擇的事情要快得多。

foo=# select * from questions_tags; 
question_id | tag_id 
-------------+-------- 
      1 |  2 
      2 |  1 
      2 |  2 
      1 |  1 
      1 |  1 
(5 rows) 

foo=# select distinct question_id, tag_id into questions_tags_tmp from questions_tags; 
SELECT 4 
foo=# select * from questions_tags_tmp; 
question_id | tag_id 
-------------+-------- 
      2 |  2 
      1 |  2 
      2 |  1 
      1 |  1 
(4 rows) 

foo=# drop table questions_tags; 
DROP TABLE 
foo=# alter table questions_tags_tmp rename to questions_tags; 
ALTER TABLE 
foo=# select * from questions_tags; 
question_id | tag_id 
-------------+-------- 
      2 |  2 
      1 |  2 
      2 |  1 
      1 |  1 
(4 rows)

來源

2013-03-13 00:51:15

請注意，您可能必須手動重新創建舊錶所具有的任何索引。 – 2013-03-13 00:53:01

有一個'psql'命令來做到這一點？重新創建索引....就是。 – marcamillion 2013-03-13 00:53:49

完美...這工作。現在我下降到只有148K記錄。非常感謝！現在......問題是......我將來如何防止這種情況發生。 – marcamillion 2013-03-13 01:18:43

壞標籤參考

DELETE FROM questions_tags 
WHERE NOT EXISTS (SELECT 1 
       FROM tags 
       WHERE tags.id = questions_tags.tag_id);

刪除標籤關聯與

刪除標籤關聯模式壞問題參考

DELETE FROM questions_tags 
WHERE NOT EXISTS (SELECT 1 
       FROM questions 
       WHERE questions.id = questions_tags.question_id);

刪除重複的標籤關聯

DELETE FROM questions_tags 
USING (SELECT qt3.user_id, qt3.question_id, MIN(qt3.id) id 
      FROM questions_tags qt3 
      GROUP BY qt3.user_id, qt3.question_id 
     ) qt2 
WHERE questions_tags.user_id=qt2.user_id AND 
     questions_tags.question_id=qt2.question_id AND 
     questions_tags.id != qt2.id

注：

嘗試他們在您的生產環境之前，請測試在開發環境中的SQL的。

來源

2013-03-12 23:19:28

順便說一下......前2個執行正常，並且不返回任何記錄。但是當我運行最後一個時，它甚至不會運行。我嘗試過，沒有尾隨';'。什麼可能導致這個？ – marcamillion 2013-03-12 23:43:48

事實上，當我在查詢的末尾添加';'時，出現語法錯誤 - https://gist.github.com/marcamillion/fc81b053c6c5928230c3我還能嘗試什麼？ – marcamillion 2013-03-13 00:10:47

上面的後一個命令是無效的PostgreSQL。 – 2013-03-13 00:20:18

如何清理我的join_table並刪除重複的條目？

回答

相關問題