你什麼都有了構建詳細信息頁面的URL。抓住相對URL(我將稱之爲路徑)附加基本URL併發出新請求。
require 'mechanize'
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Page
base = 'http://lookbook.nu'
page = agent.get(base + '/north-america')
detail_pages = page.search("//div[contains(@class, 'look_meta_container')]/p/a[1]/@href").map(&:text)
# ["/user/1069907-Veronica-P", "/elliott_alexzander", "/neno", "/skirtsofurban", "/tovogueorbust", "/dthutt", "/ryapie", "/lovebetweentheracks", "/lonleyboy", "/bobbyraffin", "/tsangtastic", "/user/737385-Katia-H"]
detail_pages.each do |path|
page = agent.get(base + path)
name = page.search("//div[@id='userheader']//h1/a").text
fans = page.search("//span[contains(text(), 'Fans')]/../span[1]").text
puts name + " have " + fans + " fans"
end
=>
Veronica P have 26,044 fans
Elliott Alexzander have 3,409 fans
Neno Neno have 15,304 fans
Laura P have 975 fans
Alexandra G. have 620 fans
Dayeanne Hutton have 336 fans
Mariah Alysz have 288 fans
Lina Dinh have 11,675 fans
Talal Amine have 882 fans
Bobby Raffin have 72,469 fans
Jenny Tsang have 8,909 fans
Katia H. have 282 fans
注:我爲了得到一個Mechanize::Page
響應使用#pluggable_parser.default
。通常你不需要,但他們沒有正確設置內容類型。
除非您使用的是很舊版本的Ruby,你不需要'要求' rubygems''。你不需要'需要'nokogiri',因爲它已經是Mechanize的依賴。另外,您可能不需要'require'open-uri'',因爲Mechanize提供了自己的抓取頁面的機制。 –