在Nokogiri的所有標籤之間抓取文字？

什麼是抓取html標籤之間所有文本的最有效方式？在Nokogiri的所有標籤之間抓取文字？

<div> 
<a> hi </a> 
....

一堆被html標記包圍的文本。

來源

2009-10-03 KJW

退房https://github.com/rgrove/消毒也 – Abram 2015-05-31 02:07:19

doc = Nokogiri::HTML(your_html) 
doc.xpath("//text()").to_s

來源

2009-10-03 05:38:39 khelll

謝謝！工作正常+1 – rusllonrails 2017-11-25 13:42:43

使用Sax解析器。比XPath選項快得多。

require "nokogiri" 

some_html = <<-HTML 
<html> 
    <head> 
    <title>Title!</title> 
    </head> 
    <body> 
    This is the body! 
    </body> 
</html> 
HTML 

class TextHandler < Nokogiri::XML::SAX::Document 
    def initialize 
    @chunks = [] 
    end 

    attr_reader :chunks 

    def cdata_block(string) 
    characters(string) 
    end 

    def characters(string) 
    @chunks << string.strip if string.strip != "" 
    end 
end 
th = TextHandler.new 
parser = Nokogiri::HTML::SAX::Parser.new(th) 
parser.parse(some_html) 
puts th.chunks.inspect

來源

2009-10-10 17:34:10

這怎麼可能被改變爲僅在body標籤之間獲取文本？ – Omnipresent 2010-12-11 16:27:53

設置一個標誌，並且只有在身體標籤關閉後才能看到身體標籤開始和停止捕捉後纔開始捕捉角色。 – 2010-12-13 00:35:11

這裏是如何讓所有的文字在這個頁面的問題DIV：

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

doc = Nokogiri::HTML(open("http://stackoverflow.com/questions/1512850/grabbing-text-between-all-tags-in-nokogiri")) 
puts doc.css("#question").to_s

來源

2009-10-14 04:44:29 pjb3

只要做到：

doc = Nokogiri::HTML(your_html) 
doc.xpath("//text()").text

來源

2013-01-06 21:02:10 arturodz

在Nokogiri的所有標籤之間抓取文字？

回答

相關問題