用Ruby解析器解析rss描述

我想用Ruby解析器解析nytimes rss提要。用Ruby解析器解析rss描述

nyt_url = 'http://www.nytimes.com/services/xml/rss/nyt/World.xml' 
open(nyt_url) do |rss| 
    @nyt_feed = RSS::Parser.parse(rss) 
end

並在視圖文件：

<h2>New York Times Feed</h2> 
<% @nyt_feed.items.each do |item| %> 
    <p> 
    <%= link_to item.title, item.link %> 
    <%= item.description %> 
    </p> 
<% end %>

但我到外面的描述看起來是這樣的：

Since air assaults by the Assad government picked up two weeks ago, 
knocking rebels in the south on their heels, Syrians have been arriving 
at refuge camps in Jordan at a rate of about 2,000 a night.<img width='1' height='1' 
src='http://rss.nytimes.com/c/34625/f/642565/s/22f90a36/mf.gif' border='0'/><br/><br/><a 
href="http://da.feedsportal.com/r/139263791500/u/0/f/642565/c/34625/s/22f90a36/a2.htm"><img 
src="http://da.feedsportal.com/r/139263791500/u/0/f/642565/c/34625/s/22f90a36/a2.img" 
border="0"/></a><img width="1" height="1" 
src="http://pi.feedsportal.com/r/139263791500/u/0/f/642565/c/34625/s/22f90a36/a2t.img" 
border="0"/>

我也有華盛頓郵報類似的情況飼料。我如何獲取圖像以實際顯示，或者至少只獲取描述部分。我是否必須使用正則表達式來處理這個問題，或者我應該使用解析器對象上的某種方法？

來源

2012-09-02 John

使用正則表達式解析XML或RSS（或HTML）不是一個好主意，因爲預測所有可能的標籤嵌套並不那麼容易。

通常你想使用XML寶石/庫來解析你的RSS或XML數據（如的libxml，引入nokogiri，牛），但是當XML飼料是真正的大，它吃了大量的內存

嘗試牛或Nokogiri，看看它是否比正則表達式更適合你。

如果您的Feed非常大，並且其中有許多文章，您可以嘗試用正則表達式剪切這些項目/文章，然後分別用Ox或Nokogiri解析它們的內容（這也適用於在並行處理的Resque作業中完成時很好）。

來源

2013-04-16 22:03:30 Tilo

用Ruby解析器解析rss描述

回答

相關問題