與機械化和Ruby

我想刮下面的網站，因爲XML的格式不正確，不包含所有的數據幀的數據進行解析，我需要：與機械化和Ruby

http://www.cafebonappetit.com/menu/your-cafe/pitzer

當我取與機械化的文件，但是，我只得到：

{meta_refresh} 
{title "Collins | Claremont McKenna Cafés | Café Bon Appétit"} 
{iframes} 
{frames} 
{links 
#<Mechanize::Page::Link "Welcome" "http://www.cafebonappetit.com/"> 
#<Mechanize::Page::Link "Our Approach" "javascript://"> 
#<Mechanize::Page::Link 
"Kitchen Principles" 
"http://www.cafebonappetit.com/our-approach/kitchen-principles"> 
..... 
}

不幸的是，我顯然需要得到什麼是表（我猜他們是iFrame中）英寸有什麼想法嗎？

謝謝！

來源

2012-04-29 AlexSBerman

頁面沒有任何框架或iFrame。 Mechanize只是報告有0個iframe，0個幀，N個鏈接和1個標題。要找到表格，只需使用'page.search（'table'）' –

謝謝！ #railsnewb – AlexSBerman

下面是一個簡單的機械+ Nokogiri腳本，用於擦除菜單項。

require 'rubygems' 
require 'mechanize' 
require 'pp' 

agent = Mechanize.new 
url = "http://www.cafebonappetit.com/menu/your-cafe/pitzer" 
page = agent.get(url) 

#Grab each daily menu 
page.search('div#menu-items > table.my-day-menu-table').each do |menu| 
    day = menu.xpath('preceding-sibling::div[1]/a').text.strip 
    puts day 
    fare = [] 
    #Collect the menu items 
    menu.xpath('tr').each do |item| 
    fare << item.xpath('td/strong').map(&:text).join(": ") 
    end 
    pp fare 
end

結果（節選）：

Sunday, May 6th, 2012 
["Brunch", 
"chef's table: custom omelet bar", 
"main plate: chicken sanchez", 
"meatless chicken and sauce", 
"options: banana pancakes", 
"stocks: beed barley", 
"vegetable minestrone", 
"Lunch", 
"main plate: steamed broccoli", 
"Dinner", 
"chef's table: pasta bar", 
"farm to fork: sauteed rainbow chard", 
"options: mozzarella sticks", 
"ovens: pizza bar", 
"main plate: roasted herb chicken", 
"baked ziti pasta", 
"steamed carrots and parsnips"]

來源

2012-05-04 01:20:46

與機械化和Ruby

回答

相關問題