查找除子節點以外的所有子級文本（）節點

我的XML文檔具有任意嵌套的節。鑑於對特定部分的參考，我需要找到該部分中的所有TextNode s 不包括第小節。查找除子節點以外的所有子級文本（）節點

例如，給定到下面的#a1節點的引用，我只需要找到「A1」和「A1」文本節點：

<root> 
    <section id="a1"> 
    <b>A1 <c>A1</c></b> 
    <b>A1 <c>A1</c></b> 
    <section id="a1.1"> 
     <b>A1.1 <c>A1.1</c></b> 
    </section> 
    <section id="a1.2"> 
     <b>A1.2 <c>A1.2</c></b> 
     <section id="a1.2.1"> 
     <b>A1.2.1</b> 
     </section> 
     <b>A1.2 <c>A1.2</c></b> 
    </section> 
    </section> 
    <section id="a2"> 
    <b>A2 <c>A2</c></b> 
    </section> 
</root>

如果它不明顯，上述是組成數據。特別是id屬性可能不存在於真實世界的文檔中。

我想出現在是找到部分中的所有文本節點，然後用Ruby減去那些我不想要最好的：

def own_text(node) 
    node.xpath('.//text()') - node.xpath('.//section//text()') 
end 

doc = Nokogiri.XML(mydoc,&:noblanks) 
p own_text(doc.at("#a1")).length #=> 4

我可以製作一個單個XPath 1.0表達式直接查找這些節點？喜歡的東西：

.//text()[ancestor::section = self] # self being the original context node

來源

2012-05-25 Phrogz

使用（與「A1」具有字符串值id屬性的部分）：

//section[@id='a1'] 
     //*[normalize-space(text()) and ancestor::section[1]/@id = 'a1']/text()

XSLT - 基於驗證：

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
<xsl:output omit-xml-declaration="yes" indent="yes"/> 
<xsl:strip-space elements="*"/> 

<xsl:template match="/"> 
    <xsl:copy-of select= 
     "//section[@id='a1'] 
      //*[normalize-space(text()) and ancestor::section[1]/@id = 'a1'] 
    "/> 
</xsl:template> 
</xsl:stylesheet>

當這種變換所提供的XML文檔應用：

<root> 
    <section id="a1"> 
     <b>A1 
      <c>A1</c> 
     </b> 
     <b>A1 
      <c>A1</c> 
     </b> 
     <section id="a1.1"> 
      <b>A1.1 
       <c>A1.1</c> 
      </b> 
     </section> 
     <section id="a1.2"> 
      <b>A1.2 
       <c>A1.2</c> 
      </b> 
      <section id="a1.2.1"> 
       <b>A1.2.1</b> 
      </section> 
      <b>A1.2 
       <c>A1.2</c> 
      </b> 
     </section> 
    </section> 
    <section id="a2"> 
     <b>A2 
      <c>A2</c> 
     </b> 
    </section> 
</root>

它評估的XPath表達式（選擇想要的文本節點的只是父母 - 纔能有清晰可見的結果）和所選擇的節點拷貝到輸出：

<b>A1 
      <c>A1</c> 
</b> 
<c>A1</c> 
<b>A1 
      <c>A1</c> 
</b> 
<c>A1</c>

UPDATE：萬一section元件可以具有相同的id屬性（或沒有id屬性在所有）使用：

 (//section)[1] 
      //*[normalize-space(text()) 
      and 
       count(ancestor::section) 
      = 
       count((//section)[1]/ancestor::section) +1]/text()

XSLT - 基於驗證：

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output omit-xml-declaration="yes" indent="yes"/> 
    <xsl:strip-space elements="*"/> 

    <xsl:template match="/"> 
     <xsl:copy-of select= 
      "(//section)[1] 
       //*[normalize-space(text()) 
       and 
        count(ancestor::section) 
       = 
        count((//section)[1]/ancestor::section) +1] 
     "/> 
    </xsl:template> 
</xsl:stylesheet>

變換結果（相同）：

<b>A1 
      <c>A1</c> 
</b> 
<c>A1</c> 
<b>A1 
      <c>A1</c> 
</b> 
<c>A1</c>

這精確地選擇了s ame想要文本節點。

來源

2012-05-26 02:20:37

您可以在不依賴'id'屬性的情況下執行此操作嗎？這只是一個演示文檔，可以清楚地說明和討論這一點。想象一下嵌套的'

'元素沒有明顯的屬性。 – Phrogz

是的，請參閱此答案的更新。 –

不錯;我忘了使用'count（）'，但即使您開始使用它，我也無法弄清楚如何「存儲」計數。這仍然不會直接在Ruby/XPath中工作（因爲在啓動新的上下文時，唯一的節點是'.'），但這似乎回答了通用XPath的問題。 – Phrogz

用途：

//text()[ancestor::section[1]/@id = 'a1']

來源

2012-05-25 23:41:57

這隻有在每個部分都有唯一的'id'屬性時纔有效。在我上面的示例數據中就是這種情況，但不是一個通用的解決方案。 +1，但不接受。 – Phrogz

@Progro：如果是這種情況，你需要在問題的文本中指定。您還需要指定如何選擇特定的「節」，因爲這是所需XPath表達式的必要前綴。查看我的答案，找到不依賴於ID的唯一性的解決方案。 –

@Dimitre任何部分都可以通過例如'// section [27]'或（實際上是我的情況）'doc.xpath（'// section'）來唯一選擇。 ...使用此特定部分引用作爲新XPath表達式的錨點...}' – Phrogz

查找除子節點以外的所有子級文本（）節點

回答

相關問題