2015-08-20 55 views
1

我有這個代碼,它需要很長的時間。如何優化這個ruby腳本?

當我使用-r配置文件時,它表明大部分時間似乎都去了mysql ...我怎麼能加快速度呢? MySQL批量插入?

探查輸出是在這裏:http://pastebin.com/fH51ZeEB

代碼:

#!/usr/bin/env ruby 

require 'mysql' 
require 'open-uri' 
require 'nokogiri' 
begin 
i=0 
src = Mysql.new 'localhost', 'me', 'pass', 'db' 
rs = src.query("SELECT * FROM npanxx") 
rs.each_hash do |row| 
    doc = Nokogiri::XML(open("http://localcallingguide.com/xmllocalprefix.php?npa="<< row["npa"].to_s << "&nxx=" << row["nxx"].to_s << "&dir=1")) 
    lca = Hash.new 
    doc.xpath("//prefix/npa | //prefix/nxx | //prefix/exch").each do |prefix| 
    if !lca.has_key? "npa" 
     lca["npa"] = prefix.content 
     next 
    end 
    if !lca.has_key? "nxx" 
     lca["nxx"] = prefix.content 
     next 
    end 
    if !lca.has_key? "exch" 
     lca["exch"] = prefix.content 
     src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{row['npa']}, #{row['nxx']}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})") 
     lca = Hash.new 
    end 
    end 
    puts (i+=1).to_s << "- #{row['npa']}, #{row['nxx']}\n" 
end 
rescue Mysql::Error => e 
    puts e.errno 
    puts e.error 
ensure 
    src.close if src 
end 
+0

似乎更適合http://codereview.stackexchange.com/因爲這段代碼實際上起作用了,不是嗎? – Oka

+0

是的,我不知道那個存在... – zevlag

回答

1

使用TyphoeusHydra你可以做requests in parallel。它允許設置自定義max concurrency(默認爲200)。
,而不是分析XMLNokogiriXPath多次搜索值和每一次存儲到新的散列的,你只是可以直接使用crack解析XML到哈希對象:

require 'benchmark' 
require 'typhoeus' 
require 'mysql' 
require 'crack' 
require 'json' 

BASE_URL ||= 'http://localcallingguide.com/xmllocalprefix.php'.freeze 

HOST  ||= 'localhost'.freeze 
USER  ||= 'me'.freeze 
PASSWORD ||= 'pass'.freeze 
DATABASE ||= 'db'.freeze 

# 
# Build lca request based on provided npa and nxx 
# @param [Integer, String] npa - NPA 
# @param [Integer, String] nxx - NXX 
# @return [Typhoeus::Request] - request object 
def lca_request(npa, nxx) 
    Typhoeus::Request.new(BASE_URL, params: { dir: 1, npa: npa, nxx: nxx }) 
end 

# 
# Convert XML string into Hash object 
# @param [String] xml - XML string to convert 
# @return [Hash] Ruby Hash object converted from XML string 
def xml_to_hash(xml) 
    Crack::XML.parse(xml) 
end 

# 
# Fetch lca_data from Hash response 
# Response with error will be converted to empty array 
# @param [Hash] hash - response 
# @return [Array] lca data from response. Empty array if invalid data provided 
def lca_data(hash) 
    data = hash['root']['lca_data']['prefix'] 
    data.is_a? Hash ? [data] : Array(data) 
rescue NoMethodError 
    [] 
end 

# 
# Fetch lca_data from XML string (see #lca_data) 
# @param [String] xml - string from where to fetch lca_data 
# @return [Array] lca data from response. Empty array if invalid data providede 
def lca_data_from_xml(xml) 
    lca_data(xml_to_hash(xml)) 
end 

# Main function 
def main 
    src = Mysql.new(HOST, USER, PASSWORD, DATABASE) 
    rs = src.query('SELECT * FROM npanxx') 
    hydra = Typhoeus::Hydra.new 
    rs.each_hash do |row| 
    npa, nxx = row['npa'], row['nxx'] 
    request = lca_request(npa, nxx) 
    request.on_complete do |response| 
     lca_data = lca_data_from_xml(response.body) 
     lca_data.each do |lca| 
     src.query("INSERT INTO npanxxlca (npa,nxx,tnpa,tnxx,texch) VALUES (#{npa}, #{nxx}, #{lca['npa']}, #{lca['nxx']}, #{lca['exch']})") 
     end 
    end 
    hydra.queue(request) 
    end 
    hydra.run 
end 

puts Benchmark.measure { main }.real 

我沒有什麼經驗MySQL工作,所以我不能推薦如何優化那部分。

+0

我沒有測試最終的代碼,因爲我的MySQL服務器和數據庫沒有設置。所以,如果您有任何疑問或問題,請讓我知道。如果這樣的作品,我很好奇:)有多快:) –

+0

我喜歡這種方法,但我遇到了一個問題,當只有1條返回,我得到數組的數組,而不是散列數組:[[ 「npa」,「907」,[「nxx」,「221」],[「exch」,「003650」],[「ocn」,「3023」],[「company_name」,「UNITED UTILITIES,INC。 「],[」rc「,」Birch Creek「],[」region「,」AK「]] npa.rb:65:在'[]'中:沒有將字符串隱式轉換爲Integer(TypeError) \t from npa .rb:65:在主' – zevlag

+0

@zevlag'塊(3級)中,我更新了'lca_data'方法以確保返回哈希數組。 –

2

你可以嘗試插入多行,我認爲這是bottleneck.First,你可以保留值的陣列中,當數組足夠大,然後插入多行,就像這樣。

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9); 

how-to-insert-multiple-records-into-database

+0

我喜歡停在100行或1MB,以先到者爲準。 –