如何通過HTTP下載二進制文件？

117

如何使用Ruby下載和保存二進制文件通過HTTP？如何通過HTTP下載二進制文件？

URL是http://somedomain.net/flv/sample/sample.flv。

我在Windows平臺上，我不想運行任何外部程序。

2010-02-15 Radek

我的解決方案是強烈基於http://snippets.dzone.com/posts/show/2469在FireFox地址欄中輸入__ruby文件download__之後出現...在你問這個問題之前，你是否在互聯網上做過任何研究？ – 2010-02-15 01:17:15

@Dejw：我做了研究，在這裏找到了一個回答的問題。基本上用你給我的相同的代碼。 'resp.body'部分令我困惑，我認爲它只會保存響應的'body'部分，但我想保存整個/二進制文件。我還發現http://rio.rubyforge.org/可能會有所幫助。此外，我的問題沒有人可以說這樣的問題還沒有回答:-) – Radek 2010-02-15 01:23:28

正文部分正好是整個文件。響應是從標題（http）和正文（文件）創建的，所以當您保存正文時您保存了該文件;-) – 2010-02-15 01:54:57

126

最簡單的方法是特定於平臺的解決方案：

也許你正在尋找：

require 'net/http' 
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception. 
Net::HTTP.start("somedomain.net") do |http| 
    resp = http.get("/flv/sample/sample.flv") 
    open("sample.flv", "wb") do |file| 
     file.write(resp.body) 
    end 
end 
puts "Done."

編輯：改變。謝謝。

EDIT2：

# instead of http.get 
f = open('sample.flv') 
begin 
    http.request_get('/sample.flv') do |resp| 
     resp.read_body do |segment| 
      f.write(segment) 
     end 
    end 
ensure 
    f.close() 
end

來源

2010-02-15 01:09:29

第一個「簡單」解決方法將無法在Windows機器上工作 – srcspider 2013-01-17 15:52:02

+14

是的，我知道。這就是爲什麼我說它是'特定於平臺的解決方案'。 – 2013-01-17 21:28:00

更多平臺特定的解決方案：GNU/Linux平臺提供'wget'。 OS X提供'curl'（'curl http://oh.no/its/pbjellytime.flv --output secretlylove.flv'）。 Windows有一個Powershell等效的'（新對象System.Net.WebClient）.DownloadFile（'http://oh.no/its/pbjellytime.flv','C:\tmp\secretlylove.flv'）''。對於所有操作系統，wget和curl都可以通過下載進行二進制文件。我仍然強烈建議使用標準庫，除非您的代碼完全是爲了您自己的喜好。 – fny 2013-01-23 12:51:34

例3在Ruby的net/http documentation展示瞭如何通過HTTP下載文件，並輸出文件，而不是隻加載到：它保存文件的一部分，同時下載解決方案內存，替代放入二進制寫入文件，例如如Dejw的答案所示。

更復雜的情況在相同的文檔中進一步顯示。

來源

2010-02-15 01:15:27 Arkku

+1用於指向現有文檔和其他示例。 – semperos 2010-12-29 18:04:55

具體鏈接如下：http://ruby-doc.org/stdlib-2.1.4/libdoc/net/http/rdoc/Net/HTTP.html#class-Net::HTTP-label-Streaming+Response+Bodies – kgilpin 2014-10-29 20:11:34

擴展在Dejw的答案（EDIT 2）：

File.open(filename,'w'){ |f| 
    uri = URI.parse(url) 
    Net::HTTP.start(uri.host,uri.port){ |http| 
    http.request_get(uri.path){ |res| 
     res.read_body{ |seg| 
     f << seg 
#hack -- adjust to suit: 
     sleep 0.005 
     } 
    } 
    } 
}

其中filename和url都是字符串。

sleep命令是一個黑客，可以戲劇性當網絡是限制因素時減少CPU使用率。 Net :: HTTP不會等待緩衝區（v1.9.2中的16kB）在生成之前填充，因此CPU忙於移動小塊。沉睡片刻讓緩衝區有機會在寫入之間填充，CPU使用率與curl解決方案相當，在我的應用程序中有4-5倍的差異。一個更強大的解決方案可能會檢查f.pos的進度並將超時調整爲目標，例如緩衝區大小的95％ - 事實上，在我的示例中，這就是我得到的0.005數字。

對不起，但我不知道更優雅的方式讓Ruby等待緩衝區填充。

編輯：

這是自動調整，以保持緩衝在低於或等於容量的版本。這是一個不雅的解決方案，但它看起來速度一樣快，並且使用盡可能少的CPU時間，因爲它正在調用curl。

它分三個階段工作。有意識的長時間睡眠時間的簡短學習時間確定了完整緩衝區的大小。丟棄期通過每次迭代快速減少睡眠時間，將其乘以更大的因子，直到找到欠填充的緩衝區。然後，在正常時期，它會上下調整一個較小的因子。

我的Ruby有點生鏽，所以我相信這可以改進。首先，沒有錯誤處理。此外，也許它可以分離成一個對象，遠離下載本身，所以你只需要在你的循環中調用autosleep.sleep(f.pos)？更妙的是，網:: HTTP可以改變等待全緩衝產生:-)

def http_to_file(filename,url,opt={}) 
    opt = { 
    :init_pause => 0.1, #start by waiting this long each time 
          # it's deliberately long so we can see 
          # what a full buffer looks like 
    :learn_period => 0.3, #keep the initial pause for at least this many seconds 
    :drop => 1.5,   #fast reducing factor to find roughly optimized pause time 
    :adjust => 1.05  #during the normal period, adjust up or down by this factor 
    }.merge(opt) 
    pause = opt[:init_pause] 
    learn = 1 + (opt[:learn_period]/pause).to_i 
    drop_period = true 
    delta = 0 
    max_delta = 0 
    last_pos = 0 
    File.open(filename,'w'){ |f| 
    uri = URI.parse(url) 
    Net::HTTP.start(uri.host,uri.port){ |http| 
     http.request_get(uri.path){ |res| 
     res.read_body{ |seg| 
      f << seg 
      delta = f.pos - last_pos 
      last_pos += delta 
      if delta > max_delta then max_delta = delta end 
      if learn <= 0 then 
      learn -= 1 
      elsif delta == max_delta then 
      if drop_period then 
       pause /= opt[:drop_factor] 
      else 
       pause /= opt[:adjust] 
      end 
      elsif delta < max_delta then 
      drop_period = false 
      pause *= opt[:adjust] 
      end 
      sleep(pause) 
     } 
     } 
    } 
    } 
end

來源

2011-08-06 01:24:49 Isa

我喜歡'睡眠'黑客！ – Radek 2011-08-06 04:02:12

我有問題，如果該文件包含德國的變音之前（ä，ö，ü）。我可以通過使用解決的問題：

ec = Encoding::Converter.new('iso-8859-1', 'utf-8') 
... 
f << ec.convert(seg) 
...

來源

2011-11-17 16:02:12 Rolf

109

我知道這是一個老問題，但谷歌把我在這裏，我想我找到了一個簡單的答案。

在Railscasts #179，瑞恩·貝茨使用Ruby的標準類OpenURI做很多東西，有人問這樣的：

（警告：未經測試的代碼，您可能需要更改/調整它。）

require 'open-uri' 

File.open("/my/local/path/sample.flv", "wb") do |saved_file| 
    # the following "open" is provided by open-uri 
    open("http://somedomain.net/flv/sample/sample.flv", "rb") do |read_file| 
    saved_file.write(read_file.read) 
    end 
end

來源

2012-02-16 13:16:10 kikito

'open（「http://somedomain.net/flv/sample/sample.flv」，'rb'）'將以二進制模式打開URL。 – zoli 2012-09-25 19:21:50

@ zoli：真棒。更新我的答案，謝謝！ – kikito 2012-09-26 10:10:30

任何人都知道，如果open-uri在@Isa解釋的時候是否聰明地填充緩衝區？ – gdelfino 2012-10-26 21:28:37

還有比Net::HTTP更多的API友好的庫，例如httparty：

require "httparty" 
File.open("/tmp/my_file.flv", "wb") do |f| 
    f.write HTTParty.get("http://somedomain.net/flv/sample/sample.flv").parsed_response 
end

來源

2013-08-27 20:21:33 fguillen

您可以使用開放式的URI，這是一個內襯

require 'open-uri' 
content = open('http://example.com').read

或通過網/ HTTP

require 'net/http' 
File.write("file_name", Net::HTTP.get(URI.parse("http://url.com")))

來源

2013-11-07 14:59:04 KrauseFx

在將文件寫入磁盤之前，將整個文件讀入內存，所以......可能很糟糕。 – kgilpin 2014-10-29 20:07:07

@kgilpin兩種解決方案？ – KrauseFx 2014-10-29 20:34:53

是的，兩種解決方案。 – eltiare 2015-05-17 20:12:36

這裏是我的Ruby HTTP到文件使用IO::copy_stream(src, dst)。

require "open-uri" 

def download(url, path) 
    File.open(path, "w") do |f| 
    IO.copy_stream(open(url), f) 
    end 
end

這裏的主要優點是它讀取和寫入塊，因此不會讀取內存中的整個響應。

我使用open(name, *rest, &block)作爲本演示的目的。 IO::copy_stream(src, dst)的第一個參數可以是任何響應讀取的IO對象。

請注意用戶提供的輸入！ open(name, *rest, &block)是不安全的，如果name來自用戶輸入！

來源

2015-11-16 22:58:16 Overbryd

這應該是公認的答案，因爲它簡潔明瞭，並且不會加載整個文件在內存中〜+性能（在這裏猜測）。 – Nikkolasg 2016-09-12 13:56:41

我同意Nikkolasg。我只是試圖使用它，它工作得很好。我修改了一下，例如，本地路徑將從給定的URL自動推斷出來，所以e。 G。「path = nil」然後檢查是否爲零;如果它是零，那麼我使用URL上的File.basename（）來推斷本地路徑。 – shevy 2017-07-03 11:41:12

我不知道爲什麼它可以正確使用''w「'。它會在Windows上工作還是更好地使用''wb「'而不是？ – sekrett 2017-12-04 10:59:11

，如果你正在尋找一種方式如何下載的臨時文件，做的東西，並刪除它試試這個寶石https://github.com/equivalent/pull_tempfile

require 'pull_tempfile' 

PullTempfile.transaction(url: 'https://mycompany.org/stupid-csv-report.csv', original_filename: 'dont-care.csv') do |tmp_file| 
    CSV.foreach(tmp_file.path) do |row| 
    # .... 
    end 
end

來源

2016-03-22 11:03:25 equivalent8

如何通過HTTP下載二進制文件？

回答

相關問題