如何從Nokogiri的HTML代碼中獲取郵件地址？

如何從Nokogiri的HTML代碼中獲取郵件地址？我在用正則表達式思考，但我不知道這是否是最好的解決方案。如何從Nokogiri的HTML代碼中獲取郵件地址？

示例代碼

<html> 
<title>Example</title> 
<body> 
This is an example text. 
<a href="mailto:[email protected]">Mail to me</a> 
</body> 
</html>

我的問題是，如果存在引入nokogiri一個方法來獲得郵件地址，如果這樣是不是有些標籤之間。

謝謝

來源

2012-02-29 jgiunta

要使用nokogiri，你會想知道電子郵件字段的類/標識。 – ScottJShea 2012-02-29 01:14:33

您需要展示您的HTML樣本，以及您嘗試過的代碼。沒有HTML，我們所做的任何建議都是毫無價值的。代碼讓我們知道您嘗試過的內容，並幫助我們將答案回覆到您的代碼中。 – 2012-02-29 01:51:10

您可以使用XPath提取電子郵件地址。

的選擇//a將選擇頁面上的任何a標籤，您可以指定使用@語法href屬性，因此//a/@href會給你href S中的網頁上的所有a標籤。

如果頁面上有多種可能的a標籤與不同的url類型（例如http:// url）混合使用，您可以使用xpath函數來進一步縮小所選節點的範圍。選擇

//a[starts-with(@href, \"mailto:\")]/@href

會給你有一個href屬性與開頭的所有a標籤的href節點「的mailto：」。

把所有這些組合起來，並增加了一些額外的代碼，去掉了「電子郵件地址：」從屬性值的開始：

require 'nokogiri' 

selector = "//a[starts-with(@href, \"mailto:\")]/@href" 

doc = Nokogiri::HTML.parse File.read 'my_file.html' 

nodes = doc.xpath selector 

addresses = nodes.collect {|n| n.value[7..-1]} 

puts addresses

與測試文件看起來像這樣：

<html> 
<title>Example</title> 
<body> 
This is an example text. 
<a href="mailto:[email protected]">Mail to me</a> 
<a href="http://example.com">A Web link</a> 
<a>An empty anchor.</a> 
</body> 
</html>

此代碼輸出所需的[email protected]。 addresses是文檔中mailto鏈接中所有電子郵件地址的數組。

來源

2012-02-29 21:21:41 matt

嘗試獲取整個html頁面並使用正則表達式。

來源

2012-02-29 16:59:01 freeze

我會先說這個，說我對Nokogiri一無所知。但我只是去了他們的網站，看了看文檔，看起來很酷。

如果您在電子郵件鏈接中添加了email_field類（或任何您想要調用的類），您可以修改其示例代碼以執行您正在查找的內容。

require 'nokogiri' 
require 'open-uri' 

# Get a Nokogiri::HTML:Document for the page we’re interested in... 

doc = Nokogiri::HTML(open('http://www.yoursite.com/your_page.html')) 

# Do funky things with it using Nokogiri::XML::Node methods... 

#### 
# Search for nodes by css 
doc.css('.email_field').each do |email| 
# assuming you have than one, do something with all your email fields here 
end

如果我是你，我只會看他們的文檔，並嘗試一些他們的例子。

這裏的網站：http://nokogiri.org/

來源

2012-02-29 20:11:06 PhillipKregg

如何從Nokogiri的HTML代碼中獲取郵件地址？

回答

相關問題