獲得機械化通過x數量的鏈接並獲得所有標題？

基本上我想用機械化要經過從A到Z的所有網頁上這個網站 http://www.tv.com/shows/sort/a_z/獲得機械化通過x數量的鏈接並獲得所有標題？

然後，每個字母得到的每一個節目標題上的所有網頁的字母「a」。此刻，我只是試圖讓它與字母「a」一起工作。這是我迄今爲止的，但不知道從哪裏出發？

require 'mechanize' 

agent=Mechanize.new 
goog = agent.get "http://www.tv.com/shows/sort/a_z/" 
search = goog.link_with(:href => "/shows/sort/a/").click

來源

2014-05-19 HarryLucas

您只需要使用一些XPath來查找您需要的內容並導航。

require 'mechanize' 
shows = Array.new 
agent = Mechanize.new 
agent.get 'http://www.tv.com/shows/sort/a_z/' 
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link| 
    agent.get letter_link[:href] 
    agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text } 

    while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do 
    agent.get next_page_link[:href] 
    agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text } 
    end 
end 

require 'pp' 
pp shows

來源

2014-05-19 08:40:56 taro

謝謝你這個完美的工作！我將它輸出到一個文本文件中，並且獲得了約62k個結果，我想的更多。它只花了大約3分鐘。感謝您花時間爲我寫出這樣的文字 – HarryLucas

下面是使用css：'.pagination .next'時看起來像// // div [@class =「_ pagination」] // a [@ class =「next」]' - 更好？ – pguardiario

@taro如何從該迭代獲取href，如果我的html包含wqer23 –

獲得機械化通過x數量的鏈接並獲得所有標題？

回答

相關問題