賽璐珞＆高性能HTTP請求

我嘗試將現有的爬蟲從EventMachine切換到Celluloid。爲了與Celluloid保持聯繫，我在一個Linux盒子上生成了一堆150 kB的靜態文件，這些文件都是通過Nginx提供的。賽璐珞＆高性能HTTP請求

底部的代碼應該能夠完成它的工作，但是我不明白的代碼存在問題：由於線程池大小爲50，代碼應該產生最多50個線程，但是它會產生180個線程。如果我將池大小增加到100，則會產生330個線程。那裏出了什麼問題？

這段代碼的簡單的複製粘貼&應該從事的每一個框，因此任何線索，歡迎:)

#!/usr/bin/env jruby 

require 'celluloid' 
require 'open-uri' 

URLS = *(1..1000) 

@@requests = 0 
@@responses = 0 
@@total_size = 0 

class Crawler 
    include Celluloid 

    def fetch(id) 
    uri = URI("http://data.asconix.com/#{id}") 
    puts "Request ##{@@requests += 1} -> #{uri}" 
    begin 
     req = open(uri).read 
    rescue Exception => e 
     puts e 
    end 
    end 
end 

URLS.each_slice(50).map do |idset| 
    pool = Crawler.pool(size: 50) 
    crawlers = idset.to_a.map do |id| 
    begin 
     pool.future(:fetch, id) 
    rescue Celluloid::DeadActorError, Celluloid::MailboxError 
    end 
    end 
    crawlers.compact.each do |resp| 
    $stdout.print "Response ##{@@responses += 1} -> " 
    if resp.value.size == 150000 
     $stdout.print "OK\n" 
     @@total_size += resp.value.size 
    else 
     $stdout.print "ERROR\n" 
    end 
    end 
    pool.terminate 
    puts "Actors left: #{Celluloid::Actor.all.to_set.length} -- Alive: #{Celluloid::Actor.all.to_set.select(&:alive?).length}" 
end 

$stdout.print "Requests total: #{@@requests}\n" 
$stdout.print "Responses total: #{@@responses}\n" 
$stdout.print "Size total: #{@@total_size} bytes\n"

順便說一句，當我定義each_slice外循環池發生同樣的問題：

.... 
@pool = Crawler.pool(size: 50) 

URLS.each_slice(50).map do |idset| 
    crawlers = idset.to_a.map do |id| 
    begin 
     @pool.future(:fetch, id) 
    rescue Celluloid::DeadActorError, Celluloid::MailboxError 
    end 
    end 
    crawlers.compact.each do |resp| 
    $stdout.print "Response ##{@@responses += 1} -> " 
    if resp.value.size == 150000 
     $stdout.print "OK\n" 
     @@total_size += resp.value.size 
    else 
     $stdout.print "ERROR\n" 
    end 
    end 
    puts "Actors left: #{Celluloid::Actor.all.to_set.length} -- Alive: #{Celluloid::Actor.all.to_set.select(&:alive?).length}" 
end

來源

2012-09-09 ctp

你使用的是什麼紅寶石？ jRuby，Rubinius等？那些版本是什麼？

我想問的原因是，每個ruby的線程處理方式不同。你似乎正在描述的是爲監督員和任務添加的線程。看看帖子的日期，很可能纖維實際上正在成爲原生線程，這可能會使它看起來像使用jRuby。另外，使用Futures通常會調用內部線程池，這與您的池無關。

有了這些原因和其他人喜歡他們，你可以尋找，這是有道理的，爲什麼你會有一個更高的線程數比你的池呼籲。這有點舊了，所以也許你可以跟進你是否仍然有這個問題，併發布輸出。

來源

2013-12-06 14:36:48 digitalextremist

賽璐珞＆高性能HTTP請求

回答

相關問題