2012-04-30 58 views
4

Rails 3.1,Ruby 1.9.2,AR/MySQL。Rails每天只保留多條記錄中的一條。保留最後一個,刪除其餘部分

如果同一類型的結果在該期間有很多結果,我正在尋找如何在每個時間段(一天)只保留一個結果的建議。一個例子可能是追蹤股票價格。最初,我們會每隔15分鐘節省一次價格,但只需要存儲1周內的每個單一價格點。在第一週後,我們每天只需要1個價格(最後一個記錄,收盤價)。

下面是一個簡單的第一次嘗試,做的工作,但非常低效的:提前

# stock has many prices, price has one stock 
# get all prices for single stock older than 1 week 
prices = stock.prices.where("created_at < ? ", Time.now-1.week) 
prices.group_by{ |price| price.created_at.to_date }.each do |k,v| # group by day 
    if v.count > 1 # if many price points that day 
    (v[0]..v[v.size-2]).each {|r| r.delete} # delete all but last record in day 
    end 
end 

感謝您的幫助/建議。我會盡力通過更新,希望能幫助某些人。

回答

1

與其說刪除每個像

(v[0]..v[v.size-2]).each {|r| r.delete} 

做DELETE_ALL但不是最後的

price_ids_to_keep = [] 
if v.count > 1 # if many price points that day 
    price_ids_to_keep << v[-1].id # get the last 
else 
    price_ids_to_keep << v[0].id 
end 

prices.where('id not in (?)',price_ids_to_keep).delete_all 

我從來沒有做過,但我敢肯定它應該工作


這是更好的,因爲它會減少DELETE查詢,但應該有一種方法可以使所有這些在j最大的一個查詢


有了商業眼光,您或您的團隊應該更好地思考這個問題。現在存儲便宜,像這樣的信息對未來的數據挖掘和類似的東西可能是寶貴的。

3

通過在SQL中完成所有操作,並將範圍限制爲上次運行的範圍,可以使其效率更高。另外,如果您添加一列以將舊的結束日期條目標記爲「已歸檔」,則會使查詢變得更加簡單。存檔價格是一週後您不會刪除的價格。

rails generate migration add_archived_to_prices archived:boolean 

在遷移之前,請修改遷移到created_at列的索引。

class AddArchivedToPrices < ActiveRecord::Migration 
    def self.up 
    add_column :prices, :archived, :boolean 
    add_index :prices, :created_at 
    end 

    def self.down 
    remove_index :prices, :created_at 
    remove_column :prices, :archived 
    end 
end 

工作流程會去是這樣的:

# Find the last entry for each day for each stock using SQL (more efficient than finding these in Ruby) 
keepers = 
    Price.group('stock_id, DATE(created_at)'). 
     having('created_at = MAX(created_at)'). 
     select(:id). 
     where('created_at > ?', last_run) # Keep track of the last run time to speed up subsequent runs 

# Mark them as archived 
Price.where('id IN (?)', keepers.map(&:id)).update_all(:archived => true) 

# Delete everything but archived prices that are older than a week 
Price.where('archived != ?', true). 
     where('created_at < ?", Time.now - 1.week). 
     where('created_at > ?', last_run). # Keep track of the last run time to speed up subsequent runs 
     delete_all 

最後要注意,一定不要group()update_all()結合起來。 group()被忽略update_all()

相關問題