提取內容屬性的內容？

我的第一個問題，在這裏，將是真棒找到答案。我是使用nokogiri的新手。提取內容屬性的內容？

這裏是我的問題。我有這樣的事情在HTML頭對目標網站（這裏是TechCrunch的帖子）：

<meta content="During my time at TechCrunch I've seen thousands of startups and written about hundreds of them. I sure as hell don't know all ..." name="description"/>

我現在想有一個腳本通過元標記運行，找到一個名爲屬性「描述「並獲取內容屬性中的內容。

我已經試過這樣的事情

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

url = "http://www.techcrunch.com/2009/10/11/the-underutilized-power-of-the-video-demo-to-explain-what-the-hell-you-actually-do/" 
doc = Nokogiri::HTML(open(url)) 
posts = doc.xpath("//meta") 
posts.each do |link| 
    a = link.attributes['name'] 
    b = link.attributes['content'] 
end

後，我可以選擇其中屬性名稱等於說明中的鏈接 - 但是這個代碼返回nil a和b。

我玩過 posts = doc.xpath("//meta"),posts = doc.xpath("//meta/*")等，但仍然無。

來源

2010-01-04 Stevensson

問題不在於xpath，因爲它似乎沒有解析文檔。你可以用'puts doc'來檢查它，它不包含完整的輸入。 – akuhn 2010-01-05 01:43:35

的問題是不是與XPath的，因爲它似乎該文件不解析。您可以檢查與puts doc，它不包含完整的輸入。這似乎是解析註釋時出現問題的原因（我懷疑無效的HTML或libxml2中的錯誤）。

在你的情況我會使用一個正則表達式作爲解決方法。鑑於<meta>標籤是非常簡單，可能的工作，如/<meta name="([^"]*)" content="([^"]*)"/

來源

2010-01-05 02:00:32 akuhn

你應該改變

doc = Nokogiri::HTML(open(url))

到

doc = Nokogiri::HTML(open(url).read)

更新：或許不是:)其實你的代碼工作對我來說，使用紅寶石1.8.7/nokogiri 1.4.0

來源

2010-01-05 16:24:46 mykhal

提取內容屬性的內容？

回答

相關問題