用Xpath和Nokogiri選擇一個特定的div元素？

我比較新的解析，並希望得到更多的練習。我想解析下面的URL：http://www.goodreads.com/quotes/tag/hard-work。用Xpath和Nokogiri選擇一個特定的div元素？

我想抓住所有標記爲「努力工作」的引號。這是該站點代碼分解爲：

<div class="content"> 
<div id="siteheader" class="uitext"> 
<div class="mainContentContainer "> 
<div class="mainContent"> 
<div id="premiumAdTop"> 
<div class="mainContentFloat"> 
<div id="flashContainer"> </div> 
<div id="connectPrompt" style=""> 
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote"> 
<h1>Quotes About Hard Work</h1> 
<div class="leftContainer"> 
<div class="mediumText"> 
<div class="quote mediumText "> 
<div class="quoteDetails "> 
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth"> 
<div class="quoteText"> 
「It's hard to beat a person who never gives up.」 
<br> 
― 
<a href="/author/show/3916262.Babe_Ruth">Babe Ruth</a> 
</div>

現在我的代碼是：

require "rubygems" 
require "open-uri" 
require "nokogiri" 

@page = Nokogiri::HTML(open("http://goodreads.com/quotes")) 
@div = @page.xpath("html/body/div[1]")

但結果不給我我想要的輸出。

我想我應該調用的方法each和collect，但我就是不知道怎麼去，我要的節點，我相信在這裏包含的某個地方：

<div id="connectPrompt" style=""> 
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote"> 
<h1>Quotes About Hard Work</h1> 
<div class="leftContainer"> 
<div class="mediumText"> 
<div class="quote mediumText "> 
<div class="quoteDetails "> 
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth"> 
<div class="quoteText"> 
「It's hard to beat a person who never gives up.」 
<br> 
― 
<a href="/author/show/3916262.Babe_Ruth">Babe Ruth</a> 
</div>

能

人請指點我正確的方向嗎？我需要進入div類才能獲得我想要的東西？

來源

2013-12-15 Uzzar

你說你想解析'http：// www.goodreads.com/quotes/tag/hard-work'，但是你的代碼獲取'http：// www.goodreads.com/quotes'，那它是哪一個呢？？此外，您不指定要從頁面中提取的內容 - 只包括引號文本，其周圍的直接「div」，引用和作者，另一個包含「div」的其他內容。你需要更具體。 – matt

嗨馬特！我想在網站www.good.reads.com上提取所有標記爲「努力工作」的引號。在我看來，唯一的方法是解析http://goodreads.com/quotes。我想要報價和作者的名字。希望有所幫助。感謝您的幫助。 – Uzzar

您可以使用XPath：

//div[@class = 'quoteText' and following-sibling::div[1][@class = 'quoteFooter' and .//a[@href and normalize-space() = 'hard-work']]]

選擇所有div元素，它的類quoteText和後跟一個div用含有hard-work鏈接類quoteFooter。

來源

2013-12-15 18:53:08

您好Martin Honnen！您的反饋幫助了很多;謝謝。沒有得到所有標記爲「努力工作」的引號，但是在第一頁上有引用標記爲「辛勤工作」（共有5頁）。目前正在使用你上面提供的代碼，希望能夠操縱它來給我所需要的東西。再一次，感謝幫助。 PS：任何建議的資源？想要練習很多東西，並且需要對html/css感到非常舒服，才能在解析時變得體面。將投入時間和工作要求，並將欣賞提供良好的新手點對點資源的技巧。謝啦！ – Uzzar

用Xpath和Nokogiri選擇一個特定的div元素？

回答

相關問題