2012-06-12 55 views
3

EM-同步嵌套迭代器我想寫解析器與EventMachine的和EM-同步(解析郵政編碼的街道和房屋)。問題是,我想解析的網站具有嵌套結構 - 對於每個郵政編碼,都有很多頁面的街道,這些頁面上有分頁。因此,算法很簡單:使用Ruby

  • 每個郵政編碼
    • 訪問postcal代碼索引頁
      • 解析索引頁
      • 解析分頁
      • 每個分頁頁面解析這個頁面

這裏是(它的工作原理)這樣的解析器的例子:

require "nokogiri" 
require "em-synchrony" 
require "em-synchrony/em-http" 

def url page = nil 
    url = "http://gistflow.com/all" 
    url << "?page=#{page}" if page 
    url 
end 

EM.synchrony do 
    concurrency = 2 

    # here [1] is array of index pages, for this template let it be just [1] 
    results = EM::Synchrony::Iterator.new([1], concurrency).map do |index, iter| 
    index_page = EM::HttpRequest.new(url).aget 

    index_page.callback do 
     # here we make some parsing and find out wheter index page 
     # has pagination. The worst case is that it has pagination 
     pages = [2,3,4,5] 

     unless pages.empty? 
     # here we need to parse all pages 
     # with urls like url(page) 
     # how can I do it more efficiently? 
     end 

     iter.return "SUCC#{index}" 
    end 

    index_page.errback do 
     iter.return "ERR #{index}" 
    end 
    end 

    p results 
    EM.stop 
end 

因此,關鍵是這樣的塊中:

unless pages.empty? 
    # here we need to parse all pages 
    # with urls like url(page) 
    # how can I do it more efficiently? 
end 

我如何能實現同步迭代裏面嵌套EM HTTP調用循環?

我嘗試不同的方法,但每一個我喜歡或errback可塊「無法從根纖維產生」錯誤的時間被調用。

回答

2

一種解決方案是使用FiberIterator和同步.get,而不是.aget

require "em-synchrony" 
require "em-synchrony/em-http" 
require "em-synchrony/fiber_iterator" 

def url page = nil 
    url = "http://gistflow.com/all" 
    url << "?page=#{page}" if page 
    url 
end 

EM.synchrony do 
    concurrency = 2 

    master_pages = [1,2,3,4] 

    EM::Synchrony::FiberIterator.new(master_pages, concurrency).each do |iter| 
    result = EM::HttpRequest.new(url).get 
    if result 
     puts "SUCC#{iter}" 
     detail_pages = [1,2,3,4]  
     EM::Synchrony::FiberIterator.new(detail_pages, concurrency).each do |iter2| 
     result2 = EM::HttpRequest.new(url).get 
     puts "SUCC/ERR #{iter} > #{iter2}" 
     end 
    else 
     puts "ERR #{iter}" 
    end 
    end 

    EM.stop 

end 
+0

感謝您的幫助,正是我需要的! – makaroni4