2012-03-11 69 views
4

這就是我想要做的:用Nokogiri從元素中移除外部標籤?

刪除一個「無」類的「span」節點。

刪除「額外」節點,但保留其中的文本。

刪除任何「BR」節點,並以「P」節點替換它們

<p class="normal"> 
    <span class="none"> 
     <extra>Some text goes here</extra> 
    </span> 
    <span class="none"> 
     <br/> 
    </span> 
    <span class="none"> 
     <extra>Some other text goes here</extra> 
     <br/> 
    </span> 
</p> 

這是我想要的輸出來實現:

<p class="normal">Some text goes here</p> 
<p class="normal">Some other text goes here</p> 

到目前爲止,我已經試過這:

doc.xpath('html/body/p/span').each do |span| 
    span.attribute_nodes.each do |a| 
     if a.value == "none" 
      span.children.each do |child| 
      span.parent << child 
      end 
      span.remove 
     end 
    end 
end 

但是,這是我得到的輸出,它甚至沒有按照正確的順序:

<p class="normal"><br /><br />Some text goes hereSome other text goes here</p> 

回答

8

嘗試了這一點

require 'rubygems' 
require 'nokogiri' 

doc = Nokogiri::XML(DATA) 
doc.css("span.none, extra").each do |span| 
    span.swap(span.children) 
end 

# via http://stackoverflow.com/questions/8937846/how-do-i-wrap-html-untagged-text-with-p-tag-using-nokogiri 
doc.search("//br/preceding-sibling::text()|//br/following-sibling::text()").each do |node| 
    if node.content !~ /\A\s*\Z/ 
    node.replace(doc.create_element('p', node)) 
    end 
end 

doc.css('br').remove 

puts doc 

__END__ 
<p class="normal"> 
    <span class="none"> 
     <extra>Some text goes here</extra> 
    </span> 
    <span class="none"> 
     <br/> 
    </span> 
    <span class="none"> 
     <extra>Some other text goes here</extra> 
     <br/> 
    </span> 
</p> 

它打印

<?xml version="1.0"?> 
<p class="normal"> 

     <p>Some text goes here</p> 





     <p>Some other text goes here</p> 


</p> 
+1

謝謝,這是非常有用的。我從你的文章中學到了很多東西。我不知道DATA常量或!〜操作符......雖然我不確定我是否理解所有xpath – 2012-03-13 14:33:09