刪除基於多列的重複記錄？

我使用Heroku託管我的Ruby on Rails應用程序，出於某種原因，我可能有一些重複的行。刪除基於多列的重複記錄？

有沒有辦法根據2個或更多標準刪除重複記錄，但只保留1個重複記錄的記錄？

在我的使用案例中，我有一個汽車在我的數據庫中的製造和模型關係。

Make  Model 
---  --- 
Name  Name 
      Year 
      Trim 
      MakeId

我想刪除具有相同的名稱，年份全部型號記錄和修剪，但保留這些記錄1（意思是，我需要的記錄，但只有一次）。我正在使用Heroku控制檯，因此我可以輕鬆地運行一些活動的記錄查詢。

有什麼建議嗎？

來源

2013-01-02 sergserg

122

class Model 

    def self.dedupe 
    # find all models and group them on keys which should be common 
    grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] } 
    grouped.values.each do |duplicates| 
     # the first one we want to keep right? 
     first_one = duplicates.shift # or pop for last one 
     # if there are any more left, they are duplicates 
     # so delete all of them 
     duplicates.each{|double| double.destroy} # duplicates can now be destroyed 
    end 
    end 

end 

Model.dedupe

查找全部
他們組上你需要的唯一
循環散列的分組模式的價值
，因爲你要保留一個副本中刪除第一個值鍵
刪除其餘的

來源

2013-01-02 15:15:47

這是在型號模型？ – meetalexjohnson

@meetalexjohnson它應該在任何主動記錄模型中。這個岩石 –

，謝謝！ –

如果您的用戶表數據如下面

User.all => 
[ 
    #<User id: 15, name: "a", email: "[email protected]", created_at: "2013-08-06 08:57:09", updated_at: "2013-08-06 08:57:09">, 
    #<User id: 16, name: "a1", email: "[email protected]", created_at: "2013-08-06 08:57:20", updated_at: "2013-08-06 08:57:20">, 
    #<User id: 17, name: "b", email: "[email protected]", created_at: "2013-08-06 08:57:28", updated_at: "2013-08-06 08:57:28">, 
    #<User id: 18, name: "b1", email: "[email protected]", created_at: "2013-08-06 08:57:35", updated_at: "2013-08-06 08:57:35">, 
    #<User id: 19, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:01:30", updated_at: "2013-08-06 09:01:30">, 
    #<User id: 20, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:07:58", updated_at: "2013-08-06 09:07:58">] 
1.9.2p290 :099 >

電子郵件ID是重複的，所以我們的目標是從用戶表中刪除所有重複的電子郵件ID。

第1步：

要獲得所有不同的電子郵件記錄ID。

ids = User.select("MIN(id) as id").group(:email,:name).collect(&:id) 
=> [15, 16, 18, 19, 17]

第2步：

刪除重複的ID從用戶表具有鮮明的電子郵件記錄ID。

現在，ids數組包含以下ID。

[15, 16, 18, 19, 17] 
User.where("id NOT IN (?)",ids) # To get all duplicate records 
User.where("id NOT IN (?)",ids).destroy_all

**軌道4,5 **

ActiveRecord的4引入了.not方法，它允許你寫步驟2中的以下內容：

User.where.not(id: ids).destroy_all

來源

2013-08-06 09:11:33

謝謝，這幫助了我！ –

不工作分組....與例外 –

你可以試試這個sql查詢，刪除所有重複的記錄，但最新的一個

DELETE FROM users USING users user WHERE (users.name = user.name AND users.year = user.year AND users.trim = user.trim AND users.id < user.id);

來源

2015-06-01 07:41:01

這將刪除所有。 – monteirobrena

到@Aditya Sanghi的類似的答案，但這種方式會更高效，因爲您只選擇重複項，而不是將每個模型對象加載到內存中，然後遍歷所有對象。

# returns only duplicates in the form of [[name1, year1, trim1], [name2, year2, trim2],...] 
duplicate_row_values = Model.select('name, year, trim, count(*)').group('name, year, trim').having('count(*) > 1').pluck(:name, :year, :trim) 

# load the duplicates and order however you wantm and then destroy all but one 
duplicate_row_values.each do |name, year, trim| 
    Model.where(name: name, year: year, trim: trim).order(id: :desc)[1..-1].map(&:destroy) 
end

此外，如果你真的不想在這個表中的重複數據，你可能要一個多列唯一索引沿的行添加到表，東西：

add_index :models, [:name, :year, :trim], unique: true, name: 'index_unique_models'

來源

2016-01-12 07:41:56 mackshkatz

要在遷移的最後我做類似以下運行（基於answer above通過@阿迪亞-Sanghi的）

class AddUniqueIndexToXYZ < ActiveRecord::Migration 
    def change 
    # delete duplicates 
    dedupe(XYZ, 'name', 'type') 

    add_index :xyz, [:name, :type], unique: true 
    end 

    def dedupe(model, *key_attrs) 
    model.select(key_attrs).group(key_attrs).having('count(*) > 1').each { |duplicates| 
     dup_rows = model.where(duplicates.attributes.slice(key_attrs)).to_a 
     # the first one we want to keep right? 
     dup_rows.shift 

     dup_rows.each{ |double| double.destroy } # duplicates can now be destroyed 
    } 
    end 
end

來源

2016-03-22 14:25:23

刪除基於多列的重複記錄？

回答

相關問題