提取文本跳過內容

我試圖提取一個有趣的節點（這裏big-structured-text）的文本，但這個節點中有一些孩子，我想跳過（這裏title，subtitle和code ）。那些「刪除」節點可以有孩子。提取文本跳過內容

的樣本數據：

<root> 
    <big-structured-text> 
     <section> 
      <title>Introduction</title> 
      In this part we describe Australian foreign policy.... 
      <subsection> 
       <subtitle>Historical context</subtitle> 
       After its independence... 
       <meta> 
        <keyword>foreign policy</keyword> 
        <keyword>australia</keyword> 
        <code> 
         <value>XXHY-123</value> 
         <label>IRRN</label> 
        </code> 
       </meta> 
      </subsection> 
     </section> 
    </big-structured-text> 
    <!-- ... --> 
    <big-structured-text> 
     <!-- ... --> 
    </big-structured-text> 
</root>

到目前爲止，我已經試過：

<xsl:for-each 
    select="//big-structured-text"> 
     <text> 
      <xsl:value-of select=".//*[not(*) 
       and not(ancestor-or-self::code) 
       and not(ancestor-or-self::subtitle) 
       and not(ancestor-or-self::title) 
       ]" /> 
     </text> 
</xsl:for-each>

但這只是把那些沒有任何孩子的節點，它會採取keyword，但不以下介紹標題文字

我也試過了：

<xsl:for-each 
    select="//big-structured-text"> 
     <text> 
      <xsl:value-of select=".//*[ 
       not(ancestor-or-self::code) 
       and not(ancestor-or-self::subtitle) 
       and not(ancestor-or-self::title) 
       ]" /> 
     </text> 
</xsl:for-each>

但是，這是多次迴應有趣的文本，有時不感興趣（每個節點爲自己迭代一次，然後每個祖先一次）。

來源

2014-01-29 AsTeR

而不是 - 你可以使用模板來解決這個問題。將模板應用於元素節點時，default behaviour只是將它們遞歸應用於其所有子節點（其中包括文本節點以及其他元素）以及文本節點以輸出文本。因此，所有你需要做的就是創建空的模板來壓縮你想要的而不是的元素，然後讓默認模板完成剩下的工作。

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 

    <xsl:template match="/"> 
    <root> 
     <xsl:apply-templates select="/root/big-structured-text" /> 
    </root> 
    </xsl:template> 

    <xsl:template match="big-structured-text"> 
    <text><xsl:apply-templates /></text> 
    </xsl:template> 

    <!-- empty template means anything inside any of these elements will be 
     ignored --> 
    <xsl:template match="title | subtitle | code" /> 
</xsl:stylesheet>

當你的樣品輸入運行此產生

<?xml version="1.0"?> 
<root><text> 


      In this part we describe Australian foreign policy.... 


       After its independence... 

        foreign policy 
        australia 




    </text><text> 

    </text></root>

您不妨調查使用的<xsl:strip-space>擺脫一些多餘的空白，但隨着混合內容，你總是要小心不要去掉太多。

來源

2014-01-29 18:50:33

你可以在你的代碼中詳細說明哪些部分告訴複製不在給定標籤內的東西嗎？ – AsTeR

@AsTeR在我的代碼中沒有任何明確的表達 - 這是我在第一行中鏈接到的默認規則的結果。這些默認規則基本上與''和'的值。使其工作的事情是，明確的「match =」標題|代碼「'模板覆蓋默認的」*「''。 –

好的感謝您的解釋，我會嘗試一個良好的睡眠後;） – AsTeR

提取文本跳過內容

回答

相關問題