高效批量更新導軌數據庫

我試圖構建一個rake實用程序，它會經常更新我的數據庫。高效批量更新導軌數據庫

這是我到目前爲止的代碼：

namespace :utils do 

    # utils:update_ip 
    # Downloads the file frim <url> to the temp folder then unzips it in <file_path> 
    # Then updates the database. 

    desc "Update ip-to-country database" 
    task :update_ip => :environment do 

    require 'open-uri' 
    require 'zip/zipfilesystem' 
    require 'csv' 

    file_name = "ip-to-country.csv" 
    file_path = "#{RAILS_ROOT}/db/" + file_name 
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip' 


    #check last time we updated the database. 
    mod_time = '' 
    mod_time = File.new(file_path).mtime.httpdate if File.exists? file_path 

    begin 
     puts 'Downloading update...' 
     #send conditional GET to server 
     zipped_file = open(url, {'If-Modified-Since' => mod_time}) 
    rescue OpenURI::HTTPError => the_error 
     if the_error.io.status[0] == '304' 
     puts 'Nothing to update.' 
     else 
     puts 'HTTPError: ' + the_error.message 
     end 
    else # file was downloaded without error. 

     Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified'] 
     delay = Time.now - zipped_file.last_modified 
     Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay/60/60/24 } days)" 

     puts 'Unzipping...' 
     File.delete(file_path) if File.exists? file_path 
     Zip::ZipFile.open(zipped_file.path) do |zipfile| 
     zipfile.extract(file_name, file_path) 
     end 

     Iptocs.delete_all 

     puts "Importing new database..." 


     # TODO: way, way too heavy find a better solution. 


     CSV.open(file_path, 'r') do |row| 
     ip = Iptocs.new( :ip_from  => row.shift, 
         :ip_to   => row.shift, 
         :country_code2 => row.shift, 
         :country_code3 => row.shift, 
         :country_name => row.shift) 
     ip.save 
     end #CSV 
     puts "Complete." 

    end #begin-resuce 
    end #task 
end #namespace

我遇到的問題是，這需要幾分鐘的時間進入10萬加項。我想找到一個更有效的方式來更新我的數據庫。理想情況下，這將保持獨立於數據庫類型，但如果不是我的生產服務器將在MySQL上運行。

謝謝你的任何見解。

來源

2010-02-17 codr

您是否嘗試過使用AR Extensions進行批量導入？將數千行的行插入到數據庫時，您會獲得令人印象深刻的性能改進。訪問他們的website瞭解更多詳情。

參考這些例子更多信息

2010-02-18 04:48:51

這正是我在找的，謝謝。 – codr 2010-02-19 01:54:05

該gem支持從CSV導入。這消除了「ActiveRecord」實例化和驗證成本。有關更多詳細信息，請參閱此文章。 http://www.rubyinside.com/advent2006/17-extendingarhtml – 2010-02-19 02:24:52

幫助我也 - 謝謝！ – ambertch 2010-05-17 19:57:58

您可以生成你需要的所有插入一個文本文件，然後執行：

mysql -u user -p db_name < mytextfile.txt

不知道這將是任何速度較快，但值得一試...

來源

2010-02-17 22:41:34 Zepplock

Rails本身使用SQL插入語句。 - 看你的軌道日誌。所以這種方法不會提高速度。 – 2010-02-17 22:44:26

當然，Rails會插入INSERT，它會如何將記錄添加到數據庫中？但在他原來的文章作者正在使用「保存」方法，其中有更多的開銷，而不僅僅是一個簡單的插入。我敢肯定它涉及到每個插入提交，做模型驗證等 – Zepplock 2010-02-18 00:52:51

使用數據庫級實用程序爲了高速盧克！

不幸的是，它們是數據庫特定的。但他們快速對於MySQL，看到http://dev.mysql.com/doc/refman/5.1/en/load-data.html

來源

2010-02-17 22:42:18

拉里說，使用特定的DB-導入實用程序，如果該文件進來，你想要的格式。但是，如果您需要在插入之前操作數據，則可以爲多行生成一個包含數據的單個INSERT查詢，這比對每行使用單獨查詢的速度要快（如ActiveRecord所做的那樣）。例如：

INSERT INTO iptocs (ip_from, ip_to, country_code) VALUES 
    ('xxx', 'xxx', 'xxx'), 
    ('yyy', 'yyy', 'yyy'), 
    ...;

來源

2010-02-17 23:45:41

我目前正在ActiveRecord的進口，這聽起來非常有前途的嘗試：

https://github.com/zdennis/activerecord-import

來源

2012-05-02 16:27:03 reto

高效批量更新導軌數據庫

回答

相關問題