2015-09-20 60 views
0

我想從Deliveroo.co.uk中爲我的郵政編碼清除餐館列表如何在使用Nokogiri寶石刮取網站時過濾我的結果?

我需要添加一種方法來確定餐廳是否已打開或關閉......從網站上它非常清晰,但我只需要更新我的代碼來反映這一點。

我該如何去做這件事?我需要創建一個「狀態」變量,然後將每個餐廳設置爲「打開」或「關閉」。

這裏是我試圖從刮網站:https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE&time=1800&day=today

我的代碼如下。

謝謝。

require 'open-uri' 
    require 'nokogiri' 
    require 'csv' 

    # Store URL to be scraped 
    url = "https://deliveroo.co.uk/restaurants/london/maida-vale?postcode=W92DE" 

    # Parse the page with Nokogiri 
    page = Nokogiri::HTML(open(url)) 

    # Display output onto the screen 
    name =[] 
    page.css('span.list-item-title.restaurant-name').each do |line| 
    name << line.text 
    end 

    category = [] 
    page.css('span.restaurant-detail.detail-cat').each do |line| 
    category << line.text 
    end 

    delivery_time = [] 
    page.css('span.restaurant-detail.detail-time').each do |line| 
    delivery_time << line.text 
    end 

    distance = [] 
    page.css('span.restaurant-detail.detail-distance').each do |line| 
    distance << line.text 
    end 

    status = [] 

    # Write data to CSV file 
    CSV.open("deliveroo.csv", "w") do |file| 
    file << ["Name", "Category", "Delivery Time", "Distance", "Status"] 
    name.length.times do |i| 
    file << [name[i], category[i], delivery_time[i], distance[i]] 
    end 
    end 
    end 

回答

1

我們需要檢查li.restaurant--details有/沒有unavailable類關閉/打開餐廳。

status = [] 
page.css('li.restaurant--details').each do |line| 
    if line.attr("class").include? "unavailable" 
    sts = "closed" 
    else 
    sts = "open" 
    end 
    status << sts 
end 

順便說一句,你應該刪除空白時,你得到RESTAURANT_NAME等..

page.css('span.list-item-title.restaurant-name').each do |line| 
name << line.text.strip 
end 

您可以在這裏參考我的代碼:https://gist.github.com/vinhnglx/4eaeb2e8511dd1454f42

相關問題