Hpricot XML文本搜索

Hpricot + Ruby XML解析和邏輯選擇。Hpricot XML文本搜索

目標：查找作者Bob寫的所有標題。

我的XML文件：

<rss> 
<channel> 
<item> 
<title>Book1</title> 
<pubDate>march 1 2010</pubDate> 
<author>Bob</author> 
</item> 

<item> 
<title>book2</title> 
<pubDate>october 4 2009</pubDate> 
<author>Bill</author> 
</item> 

<item> 
<title>book3</title> 
<pubDate>June 5 2010</pubDate> 
<author>Steve</author> 
</item> 
</channel> 
</rss> 

#my Hpricot, running this code returns no output, however the search pattern works on its own. 
(doc % :rss % :channel/:item).each do |item| 

     a=item.search("author[text()*='Bob']") 

     #puts "FOUND" if a.include?"Bob" 
     puts item.at("title") if a.include?"Bob" 

    end

來源

2011-02-11 Dejan

我從來沒有見過類似`（doc％：rss％：channel /：item）`的訪問器與Hpricot或Nokogiri一起使用。 – 2011-02-11 23:36:26

一個背後的XPath的想法是它可以讓我們類似的導航DOM到磁盤的目錄：

require 'hpricot' 

xml = <<EOT 
<rss> 
    <channel> 
     <item> 
      <title>Book1</title> 
      <pubDate>march 1 2010</pubDate> 
      <author>Bob</author> 
     </item> 

     <item> 
      <title>book2</title> 
      <pubDate>october 4 2009</pubDate> 
      <author>Bill</author> 
     </item> 

     <item> 
      <title>book3</title> 
      <pubDate>June 5 2010</pubDate> 
      <author>Steve</author> 
     </item> 

     <item> 
      <title>Book4</title> 
      <pubDate>march 1 2010</pubDate> 
      <author>Bob</author> 
     </item> 

    </channel> 
</rss> 
EOT 

doc = Hpricot(xml) 

titles = (doc/'//author[text()="Bob"]/../title') 
titles # => #<Hpricot::Elements[{elem <title> "Book1" </title>}, {elem <title> "Book4" </title>}]>

這意味着：「找到所有由Bob書，然後找了一個水平並在標題標籤」。

我增加了一本「Bob」的書來測試所有的事件。

獲取包含一本由鮑勃的項目，剛剛搬回了一個級別：

items = (doc/'//author[text()="Bob"]/..') 
puts items # => nil 
# >> <item> 
# >>    <title>Book1</title> 
# >>    <pubdate>march 1 2010</pubdate> 
# >>    <author>Bob</author> 
# >>   </item> 
# >> <item> 
# >>    <title>Book4</title> 
# >>    <pubdate>march 1 2010</pubdate> 
# >>    <author>Bob</author> 
# >>   </item>

我也想通了什麼(doc % :rss % :channel/:item)在做什麼。這相當於嵌套搜索，減去包裝括號，而這些都應該在角度來說，Hpricot-ESE相同：

(doc % :rss % :channel/:item).size # => 4 
(((doc % :rss) % :channel)/:item).size # => 4 
(doc/'//rss/channel/item').size # => 4 
(doc/'rss channel item').size # => 4

因爲'//rss/channel/item'是你怎麼會通常看到的XPath訪問，並'rss channel item'是CSS訪問，我建議使用這些格式進行維護和清晰。

來源

2011-02-12 00:18:49

如果你不是角度來說，Hpricot設置，這裏是在Nokogiri使用XPath做到這一點的一種方法：

require 'nokogiri' 
doc = Nokogiri::XML(my_rss_string) 
bobs_titles = doc.xpath("//title[parent::item/author[text()='Bob']]") 
p bobs_titles.map{ |node| node.text } 
#=> ["Book1"]

編輯：@ theTinMan的XPath的還效果更好，更具可讀性，並且可能更快：

bobs_titles = doc.xpath("//author[text()='Bob']/../title")

來源

2011-02-11 22:01:35 Phrogz

Nokogiri +1。它得到了一大羣Ruby開發人員的支持，就像荷馬辛普森的甜甜圈一樣，它沒有什麼不能做的。 – 2011-02-11 23:34:25

是的nokogiri是偉大的，我沒有看到這些天運行hpricot的理由。 – 2011-02-12 00:20:43

Hpricot XML文本搜索

回答

相關問題