獲取鏈接名稱href <a>標籤nokogiri

我正在抓取一些數據誰的數據是/h2/a，但a的href應該包含http://www.thedomain.com。所有鏈接都是這樣的： thedomain.com/test等等。現在我只獲取文本，而不是href鏈接本身的名稱。獲取鏈接名稱href <a>標籤nokogiri

例如：

<h2> 
<a href="http://www.thedomain.com/test">Hey there</a> 
<a href="http://www.thedomain.com/test1">2nd link</a> 
<a href="http://www.thedomain.com/test2">3rd link</a> 
</h2>

這裏是我的代碼：

html_doc.xpath('//h2/a[contains(@href, "http://www.thedomain.com")]/text()')

嘿，第二個環節，第3連桿

而我想http://www.thedomain.com/test等。

來源

2015-11-01 fscore

剛剛獲得@href，而不是text()：

//h2/a[contains(@href, "http://www.thedomain.com")]/@href

來源

2015-11-01 01:43:36 alecxe

您還可以使用CSS選擇器（可能是在這種情況下比xpath更容易使用）用於此目的。您可以使用h2下選擇<a>元素：

html_doc.css('h2 a')

這是代碼的完整工作版本：

html = <<EOT 
<html> 
    <h2> 
     <a href="http://www.thedomain.com/test">Hey there</a> 
     <a href="http://www.thedomain.com/test1">2nd link</a> 
     <a href="http://www.thedomain.com/test2">3rd link</a> 
    </h2> 
</html> 
EOT 

html_doc = Nokogiri::HTML(html) 
html_doc.css('h2 a').map { |link| p link['href'] } 
# => "http://www.thedomain.com/test" 
# => "http://www.thedomain.com/test1" 
# => "http://www.thedomain.com/test2"

來源

2015-11-01 01:51:23

獲取鏈接名稱href <a>標籤nokogiri

回答

相關問題