獲取所有前面的/下面的兄弟文本內容

考慮下面的XML：獲取所有前面的/下面的兄弟文本內容

<paratext ID="p34"><bold>pass</bold> <bold>pass</bold></paratext> 
<paratext ID="p35"><bold>pass</bold></paratext> 
<paratext ID="p36">foo <bold>pass</bold> bar</paratext> 
<paratext ID="p37">foo<bold> pass </bold>bar</paratext> 
<paratext ID="p38"><bold>fail</bold><bold>fail</bold></paratext> 
<paratext ID="p39">foo<bold>fail</bold>bar</paratext>

P34應該通過，因爲有大膽標籤的字母之間非阿爾法
P35應該通過，因爲沒有字母字符上大膽標籤外
P36應該通過，因爲有大膽的文字等文本
P37之間的非阿爾法應通過，因爲有大膽的文字等文本
P38應該失敗，因爲它們之間的非阿爾法在t之間沒有字母字符他大膽字母字符
P39應該失敗，因爲有大膽的文字和「富」或「 - 」之間沒有字母字符

我試圖通過Schematron的做到這一點一直是這樣的：

<iso:rule context="//jd:csc|//jd:bold|//jd:ital|//jd:underscore"> 
<iso:assert test=" 
    string-length(preceding-sibling::text()) = 0 
    or  
    matches(substring(preceding-sibling::text(), string-length(preceding-sibling::text())), '[^a-zA-Z]') 
    or 
    matches(substring(.,1,1), '[^a-zA-Z]') 
    "> 
    {WS1046} An .alpha character cannot both immediately preceed and follow &lt;<iso:value-of select="name()"/>&gt; tag 
</iso:assert> 
<iso:assert test=" 
    string-length(following-sibling::text()) = 0 
    or 
    matches(substring(following-sibling::text(), 1,1), '[^a-zA-Z]') 
    or 
    matches(substring(., string-length(.)), '[^a-zA-Z]') 
    "> 
    {WS1046} An .alpha character cannot both immediately preceed and follow &lt;/<iso:value-of select="name()"/>&gt; tag 
</iso:assert> 
</iso:rule>

的問題在於它僅查看當前上下文的父級的直接子文本節點。因此，p38不會失敗，因爲沒有直接的子文本節點。此外，類似b<foo>bar <bold>pass</bold>會失敗，因爲它只會看到preceding-sibling::text()中的「b」，並且看不到"foo "。

我也嘗試::*/text()而不是::text()，但後來我遇到了類似的問題，因爲我只看到兄弟元素內的文本，並沒有得到直接兄弟文本節點。我需要把這兩件事情結合在一起，有誰知道如何？

例如，在此xml：

<paratext ID="p1">hello <foo>bar</foo> <bold>THIS</bold> <foo>bar</foo>goodbye</paratext>

當上下文規則命中<bold>THIS</bold>並檢查前，我想它看到"hello bar "和檢查以下時，我想它看" bargoodbye"。

來源

2013-11-22 smerny

使用XPath 2.0（這你好像你用matches使用），那麼你可以使用：

string-join(preceding-sibling::node(), '')

得到"hello bar "，並且：

string-join(following-sibling::node(), '')

得到" bargoodbye"。

上述各行假定您只有元素和文本節點爲兄弟。如果可以有評論和/或處理說明，並且您想忽略其內容爲這些規則，您可以使用：

string-join(preceding-sibling::* | preceding-sibling::text(), '')

來源

2013-11-22 17:02:57

獲取所有前面的/下面的兄弟文本內容

回答

相關問題