2011-09-13 35 views
1

下面是一個例子HTML片段:引入nokogiri更換內部文本與<span>編詞

<p class="stanza">Thus grew the tale of Wonderland:<br/> 
    Thus slowly, one by one,<br/> 
    Its quaint events were hammered out -<br/> 
    And now the tale is done,<br/> 
    And home we steer, a merry crew,<br/> 
    Beneath the setting sun.<br/></p> 

我需要圍繞一個<span id="w0">Thus </span>這樣每個字:

<span id='w1'>Anon,</span> <span id='w2'>to</span> <span id='w3'>sudden</span> 
<span id='w4'>silence</span> <span id='w5'>won,</span> .... 

我寫了這其中創建新分段。我如何替換舊版本的舊版本?

def callchildren(n) 
    n.children.each do |n| # call recursively until arrive at a node w/o children 
    callchildren(n) 
    end 
    if n.node_type == 3 && n.to_s.strip.empty? != true 
    new_node = "" 
    n.to_s.split.each { |w| 
     new_node = new_node + "<span id='w#{$word_number}'>#{w}</span> " 
     $word_number += 1 
    } 
    # puts new_node 
    # HELP? How do I get new_node swapped in? 
    end 
end 

回答

2

我試圖爲您提供問題的解決方案:鑑於doc一個引入nokogiri :: HTML文檔::

require 'nokogiri' 

Inf = 1.0/0.0 

def number_words(node, counter = nil) 
    # define infinite counter (Ruby >= 1.8.7) 
    counter ||= (1..Inf).each 
    doc = node.document 

    unless node.is_a?(Nokogiri::XML::Text) 
    # recurse for children and collect all the returned 
    # nodes into an array 
    children = node.children.inject([]) { |acc, child| 
     acc += number_words(child, counter) 
    } 
    # replace the node's children 
    node.children = Nokogiri::XML::NodeSet.new(doc, children) 
    return [node] 
    end 

    # for text nodes, we generate a list of span nodes 
    # and return it (this is more secure than OP's original 
    # approach that is vulnerable to HTML injection)n 
    node.to_s.strip.split.inject([]) { |acc, word| 
    span = Nokogiri::XML::Node.new("span", node) 
    span.content = word 
    span["id"] = "w#{counter.next}" 
    # add a space if we are not at the beginning 
    acc << Nokogiri::XML::Text.new(" ", doc) unless acc.empty? 
    # add our new span to the collection 
    acc << span 
    } 
end 

# demo 
if __FILE__ == $0 
    h = <<-HTML 
    <p class="stanza">Thus grew the tale of Wonderland:<br/> 
    Thus slowly, one by one,<br/> 
    Its quaint events were hammered out -<br/> 
    And now the tale is done,<br/> 
    And home we steer, a merry crew,<br/> 
    Beneath the setting sun.<br/></p> 
    HTML 

    doc = Nokogiri::HTML.parse(h) 
    number_words(doc) 
    p doc.to_xml 
end 
+0

我對Ruby來說是新的,我必須說我非常喜歡無限循環結構和使用.inject。該死的優雅。我認爲我的問題並不是「完全」擁抱nokogiri,而是需要添加一個節點。非常感謝。而且我更聰明。 – Charlie

1

,你可以做這樣的事情:

i = 0 
doc.search('//p[@class="stanza"]/text()').each do |n| 
    spans = n.content.scan(/\S+/).map do |s| 
     "<span id=\"w#{i += 1}\">" + s + '</span>' 
    end 
    n.replace(spans.join(' ')) 
end 
+0

這也起作用,但不像通用於不同的,更復雜的html佈局。非常感謝你的幫助。 – Charlie

+0

非常好的解決方案。 –