如何僅選擇Nokogiri的葉節點？

我正在尋找一些關於如何完成的建議。我想一個解決方案只使用XPath：如何僅選擇Nokogiri的葉節點？

一個HTML例子：

<div> 
    <div> 
    <div>text div (leaf)</div> 
    <p>text paragraph (leaf)</p> 
    </div> 
</div> 
<p>text paragraph 2 (leaf)</p>

代碼：

doc = Nokogiri::HTML.fragment("- the html above -") 
result = doc.xpath("*[not(child::*)]") 


[#<Nokogiri::XML::Element:0x3febf50f9328 name="p" children=[#<Nokogiri::XML::Text:0x3febf519b718 "text paragraph 2 (leaf)">]>]

但這僅支持XPath給了我最後一個「P」。我想要的就像一個平坦的行爲，只返回葉節點。

下面是計算器一些參考答案：

How to select all leaf nodes using XPath expression?

XPath - Get node with no child of specific type

感謝

來源

2013-07-26 Luccas

你想要什麼值？ –

文本上有（葉）的所有節點 – Luccas

@Luccas：你只想要文本，還是你想要包含元素？即你想'文本段落（葉）'還是'

文本段落（葉）

'？如果你只想要文本，你想單獨使用所有的文本節點，還是隻需要將所有文本作爲單個字符串進行拼接？ – Borodin

與您的代碼的問題是語句：

doc = Nokogiri::HTML.fragment("- the html above -")

在這裏看到：

require 'nokogiri' 

html = <<END_OF_HTML 
<div> 
    <div> 
    <div>text div (leaf)</div> 
    <p>text paragraph (leaf)</p> 
    </div> 
</div> 
<p>text paragraph 2 (leaf)</p> 
END_OF_HTML 


doc = Nokogiri::HTML(html) 
#doc = Nokogiri::HTML.fragment(html) 
results = doc.xpath("//*[not(child::*)]") 
results.each {|result| puts result} 

--output:-- 
<div>text div (leaf)</div> 
<p>text paragraph (leaf)</p> 
<p>text paragraph 2 (leaf)</p>

如果我運行此：

doc = Nokogiri::HTML.fragment(html) 
results = doc.xpath("//*[not(child::*)]") 
results.each {|result| puts result}

我得到沒有輸出。

來源

2013-07-26 20:16:35 7stud

請參閱https://github.com/sparklemotion/nokogiri/issues/213和https://github.com/sparklemotion/nokogiri/issues/572 – Phrogz

可以使用發現，沒有子元素的所有元素節點：

//*[not(*)]

例如：

require 'nokogiri' 

doc = Nokogiri::HTML.parse <<-end 
<div> 
    <div> 
    <div>text div (leaf)</div> 
    <p>text paragraph (leaf)</p> 
    </div> 
</div> 
<p>text paragraph 2 (leaf)</p> 
end 

puts doc.xpath('//*[not(*)]').length 
#=> 3 

doc.xpath('//*[not(*)]').each do |e| 
    puts e.text 
end 
#=> "text div (leaf)" 
#=> "text paragraph (leaf)" 
#=> "text paragraph 2 (leaf)"

來源

2013-07-26 20:14:37

在XPath中，文本本身就是一個節點 - 所以給出你的評論，你只想選擇標籤內容，而不是包含內容的標籤 - 但是你會捕獲一個<br/>（如果有的話）。

我猜你正在尋找所有不含有其他元素（標籤）的元素（這是不準確什麼你一直要求的） - 那麼你的罰款與@Justin柯的答案和使用XPath表達式

//*[not(*)]

如果你真的想尋找所有葉子節點，您不能使用*選擇，但需要使用node()：

//node()[not(node())]

節點可以是元素，但也可以是文本節點，註釋，處理指令，屬性甚至是XML文檔（但不能在其他元素中出現）。

如果你真的只希望文本節點，去爲//text()像@Priti提出，這的確有點，而不是什麼葉子節點都是選擇正是你要求（通過突出顯示的節點定義爲）。

來源

2013-07-26 21:38:37

如何僅選擇Nokogiri的葉節點？

回答

相關問題