2013-08-31 103 views
1

我不能放棄該產品的價格和輸出,我得到的是遵循對每一價格:如何使用Nokogiri刮取HTML?

<div class="pu-final"> 
    <span class="fk-font-17 fk-bold">Rs. 1999</span> 
</div> 

我的代碼是:

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

url = "http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,cil,nit,e1f" 
doc = Nokogiri::HTML(open(url)) 
puts doc.at_css("title").text 
doc.css(".gu4,.browse-product").each do |item| 
    title = item.at_css(".fk-display-block,.title").text 
    puts title 
    puts "=================" 
    price = item.at_css(".pu-final") 
    puts price 
end 
+0

你爲什麼要關閉帖子?我正在寫一個答案。 –

回答

2

我嘗試相同的代碼無線一個小小的改變,它運作良好。搏一搏。

變化

price = item.at_css(".pu-final") 

price = item.at_css(".pu-final").text unless item.at_css(".pu-final").nil? 
+0

它工作得很好,並按要求。謝謝 – shamshul2007

0

您可以如下操作:

require 'nokogiri' 

doc = Nokogiri::HTML::Document.parse <<-eotl 
<div class="pu-final"> 

        <span class="fk-font-17 fk-bold">Rs. 1999</span> 
</div> 
eotl 

doc.at_css('div.pu-final > span.fk-font-17.fk-bold').class 
# => Nokogiri::XML::Element 
doc.at_css('div.pu-final > span.fk-font-17.fk-bold').text 
# => "Rs. 1999" 

doc.at_css('div.pu-final')會給你Nokogiri::XML::Node。然後你必須使用Nokogiri::XML::Node#text()來獲取元素內的文本值。

使用XPath

doc.xpath("normalize-space(//div[contains(@class,'pu-final')]/span[contains(@class,'fk-font-17')])") 
# => "Rs. 1999" 

的完整代碼

require 'nokogiri' 
require 'open-uri' 

url = "http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,cil,nit,e1f" 
doc = Nokogiri::HTML(open(url)) 

doc.css("div.pu-details.lastUnit").each do |dv| 
    product_name = dv.at_css('div.pu-title a').text.strip 
    product_price = dv.xpath("normalize-space(.//div[contains(@class,'pu-final')]/span)").to_s 
    print product_name," <-----> ",product_price,"\n" 
end 

輸出

Fila Storm Zender Sneakers <-----> Rs. 1819 
Puma Future Cat M1 Big 102 O Sneakers <-----> Rs. 3849 
Fila Filamotor V4 Sneakers <-----> Rs. 1449 
Adidas Volantis Hiking Shoes <-----> Rs. 2999 
Fila Varsity Sneakers <-----> Rs. 1249 
Puma Evo Speed F1 Low BMW Sneakers <-----> Rs. 2609 
Lee Cooper Running and Walking Shoes <-----> Rs. 1329 
Lee Cooper Running and Walking Shoes <-----> Rs. 1329 
United Colors of Benetton Sneakers <-----> Rs. 2799 
United Colors of Benetton Party Wear Shoes <-----> Rs. 2449 
Timberland 6 In Premium Boots <-----> Rs. 8490 
Timberland Ek Mid Boots <-----> Rs. 8490 
Clarks Montacute Lord Boots <-----> Rs. 3249 
Clarks Latch Mast Corporate Casuals <-----> Rs. 1999 
Levi's Boots <-----> Rs. 2999 
+0

我嘗試過,但它爲所有產品賦予相同的價值,它如何將其與產品名稱同時打印的特定產品綁定。 – shamshul2007

+0

@ shamshul2007相同的值意味着什麼?你能給出更相關的HTML部分嗎?你給的HTML,根據我的回答會適合你.. –

+0

我想要廢除頁面http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,尼爾,e1f和輸出應該像產品名稱=產品的價格 – shamshul2007