3
EM-同步嵌套迭代器我想寫解析器與EventMachine的和EM-同步(解析郵政編碼的街道和房屋)。問題是,我想解析的網站具有嵌套結構 - 對於每個郵政編碼,都有很多頁面的街道,這些頁面上有分頁。因此,算法很簡單:使用Ruby
- 每個郵政編碼
- 訪問postcal代碼索引頁
- 解析索引頁
- 解析分頁
- 每個分頁頁面解析這個頁面
- 訪問postcal代碼索引頁
這裏是(它的工作原理)這樣的解析器的例子:
require "nokogiri"
require "em-synchrony"
require "em-synchrony/em-http"
def url page = nil
url = "http://gistflow.com/all"
url << "?page=#{page}" if page
url
end
EM.synchrony do
concurrency = 2
# here [1] is array of index pages, for this template let it be just [1]
results = EM::Synchrony::Iterator.new([1], concurrency).map do |index, iter|
index_page = EM::HttpRequest.new(url).aget
index_page.callback do
# here we make some parsing and find out wheter index page
# has pagination. The worst case is that it has pagination
pages = [2,3,4,5]
unless pages.empty?
# here we need to parse all pages
# with urls like url(page)
# how can I do it more efficiently?
end
iter.return "SUCC#{index}"
end
index_page.errback do
iter.return "ERR #{index}"
end
end
p results
EM.stop
end
因此,關鍵是這樣的塊中:
unless pages.empty?
# here we need to parse all pages
# with urls like url(page)
# how can I do it more efficiently?
end
我如何能實現同步迭代裏面嵌套EM HTTP調用循環?
我嘗試不同的方法,但每一個我喜歡或errback可塊「無法從根纖維產生」錯誤的時間被調用。
感謝您的幫助,正是我需要的! – makaroni4