2016-10-20 118 views
2

我已經使用Typhoeus將zip文件流式傳輸到內存,然後遍歷每個文件以提取XML文檔。要閱讀我使用Nokogiri的XML文件,但遇到錯誤Errno::ENOENT: No such file or directory @ rb_sysopen - my_xml_doc.xml如何從rails下載的zip文件獲取XML文檔

我擡頭看了看錯誤,發現ruby很可能在錯誤的目錄中運行腳本。我有點困惑,我是否需要先將XML文檔保存到內存中,然後才能讀取它?

這裏是我的代碼,以進一步澄清:

控制器

def index 
    url = "http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/" 
    html_response = Typhoeus.get(url) 
    doc = Nokogiri::HTML(html_response.response_body) 

    path_array = [] 
    doc.search("a").each do |value| 
    path_array << value.content if value.content.include?(".zip") 
    end 

    path_array.each do |zip_link| 
    download_file = File.open zip_link, "wb" 
    request = Typhoeus::Request.new("#{url}#{zip_link}") 
    binding.pry 

    request.on_headers do |response| 
     if response.code != 200 
     raise "Request failed" 
     end 
    end 

    request.on_body do |chunk| 
     download_file.write(chunk) 
    end 

    request.run 

    Zip::File.open(download_file) do |zipfile| 
     zipfile.each do |file| 
     binding.pry 
     doc = Nokogiri::XML(File.read(file.name)) 
     end 
    end 
    end 

end 

文件

=> #<Zip::Entry:0x007ff88998373 
@comment="", 
@comment_length=0, 
@compressed_size=49626, 
@compression_method=8, 
@crc=20393847, 
@dirty=false, 
@external_file_attributes=0, 
@extra={}, 
@extra_length=0, 
@filepath=nil, 
@follow_symlinks=false, 
@fstype=0, 
@ftype=:file, 
@gp_flags=2056, 
@header_signature=009890, 
@internal_file_attributes=0, 
@last_mod_date=18769, 
@last_mod_time=32626, 
@local_header_offset=0, 
@local_header_size=nil, 
@name="my_xml_doc.xml", 
@name_length=36, 
@restore_ownership=false, 
@restore_permissions=false, 
@restore_times=true, 
@size=138793, 
@time=2016-10-17 15:59:36 -0400, 
@unix_gid=nil, 
@unix_perms=nil, 
@unix_uid=nil, 
@version=20, 
@version_needed_to_extract=20, 
@zipfile="some_zip_file.zip"> 
+1

你總是知道這些XML文件的大小範圍將是?如果不是,並且如果它們有可能變得相當大,那麼在操縱它們之前可能需要將它們保存到磁盤。 –

+0

我不會總是知道尺寸,謝謝你的建議!最終,我將把XML直接放到Redis列表中。 (在代碼中還沒有得到那麼多)。 – Ctpelnar1988

回答

0

這是我想出了一個解決方案:

個寶石:

gem 'typhoeus' 
gem 'rubyzip' 
gem 'redis', '~>3.2' 

控制器:

def xml_to_redis_list(url) 
    html_response = Typhoeus.get(url) 
    doc = Nokogiri::HTML(html_response.response_body) 
    @redis = Redis.new 

    path_array = [] 
    doc.search("a").each do |value| 
    path_array << value.content if value.content.include?(".zip") 
    end 

    path_array.each do |zip_link| 
    download_file = File.open zip_link, "wb" 
    request = Typhoeus::Request.new("#{url}#{zip_link}") 

    request.on_headers do |response| 
     if response.code != 200 
     raise "Request failed" 
     end 
    end 

    request.on_body do |chunk| 
     download_file.write(chunk) 
    end 

    request.run 

    while download_file.size == 0 
     sleep 1 
    end 

    zip_download = Zip::File.open(download_file.path) 
    Zip::File.open("#{Rails.root}/#{zip_download.name}") do |zip_file| 
     zip_file.each do |file| 
     xml_string = zip_file.read(file.name) 
     check_if_xml_duplicate(xml_string) 
     @redis.rpush("NEWS_XML", xml_string) 
     end 
    end 
    File.delete("#{Rails.root}/#{zip_link}") 
    end 

end 

def check_if_xml_duplicate(xml_string) 
    @redis.lrem("NEWS_XML", -1, xml_string) 
end