比較基於屬性序列的2個節點集

我試圖建立一種庫XML，比較各種節點並將它們組合起來以備後用。邏輯應該相當簡單，如果給定語言的tag_XX屬性值序列等於另一種語言的tag_YY屬性值序列，則可以組合這些節點。請參閱下面的XML例子比較基於屬性序列的2個節點集

<Book> 
<Section> 
    <GB> 
     <Para tag_GB="L1"> 
      <Content_GB>string_1</Content_GB> 
     </Para> 
     <Para tag_GB="Illanc"> 
      <Content_GB>string_2</Content_GB> 
     </Para> 
     <Para tag_GB="|PLB"> 
      <Content_GB>string_3</Content_GB> 
     </Para> 
     <Para tag_GB="L1"> 
      <Content_GB>string_4</Content_GB> 
     </Para> 
     <Para tag_GB="Sub"> 
      <Content_GB>string_5</Content_GB> 
     </Para> 
     <Para tag_GB="L3"> 
      <Content_GB>string_6</Content_GB> 
     </Para> 
     <Para tag_GB="Subbull"> 
      <Content_GB>string_7</Content_GB> 
     </Para> 
    </GB> 
    <!-- German translations - OK because same attribute sequence --> 
    <DE> 
     <Para tag_DE="L1"> 
      <Content_DE>German_translation of_string_1</Content_DE> 
     </Para> 
     <Para tag_DE="Illanc"> 
      <Content_DE>German_translation of_string_2</Content_DE> 
     </Para> 
     <Para tag_DE="|PLB"> 
      <Content_DE>German_translation of_string_3</Content_DE> 
     </Para> 
     <Para tag_DE="L1"> 
      <Content_DE>German_translation of_string_4</Content_DE> 
     </Para> 
     <Para tag_DE="Sub"> 
      <Content_DE>German_translation of_string_5</Content_DE> 
     </Para> 
     <Para tag_DE="L3"> 
      <Content_DE>German_translation of_string_6</Content_DE> 
     </Para> 
     <Para tag_DE="Subbull"> 
      <Content_DE>German_translation of_string_7</Content_DE> 
     </Para> 
    </DE> 
    <!-- Danish translations - NG because not same attribute sequence --> 
    <DK> 
     <Para tag_DK="L1"> 
      <Content_DK>Partial_Danish_translation_of_string_1</Content_DK> 
     </Para> 
     <Para tag_DK="L1_sub"> 
      <Content_DK>Partial_Danish_translation_of_string_1</Content_DK> 
     </Para> 
     <Para tag_DK="Illanc"> 
      <Content_DK>Danish_translation_of_string_2</Content_DK> 
     </Para> 
     <Para tag_DK="L1"> 
      <Content_DK>Danish_translation_of_string_4</Content_DK> 
     </Para> 
     <Para tag_DK="|PLB"> 
      <Content_DK>Danish_translation_of_string_3</Content_DK> 
     </Para> 
     <Para tag_DK="L3"> 
      <Content_DK>Danish_translation_of_string_6</Content_DK> 
     </Para> 
     <Para tag_DK="Sub"> 
      <Content_DK>Danish_translation_of_string_5</Content_DK> 
     </Para> 
     <Para tag_DK="Subbull"> 
      <Content_DK>Danish_translation_of_string_7</Content_DK> 
     </Para> 
    </DK> 
</Section> 
</Book>

所以

GB tag_GB值序列= L1 - > Illanc - > - > SubBul

DE tag_DE值序列= L1 - > Illanc - > .. - > SubBul（同GB所以OK）

DK tag_DK值序列= L1 - > L1.sub - >糟糕，預期Illanc含義，這種序列是不一樣的GB和語言環境可以忽略

因爲德語和英語節點集具有相同的屬性序列我喜歡如下把它們混合起來：

<Book> 
<Dictionary> 
    <Para tag="L1"> 
     <Content_GB>string_1</Content_GB> 
     <Content_DE>German_translation of_string_1</Content_DE> 
    </Para> 
    <Para tag="Illanc"> 
     <Content_GB>string_2</Content_GB> 
     <Content_DE>German_translation of_string_2</Content_DE> 
    </Para> 
    <Para tag="|PLB"> 
     <Content_GB>string_3</Content_GB> 
     <Content_DE>German_translation of_string_3</Content_DE> 
    </Para> 
    <Para tag="L1"> 
     <Content_GB>string_4</Content_GB> 
     <Content_DE>German_translation of_string_4</Content_DE> 
    </Para> 
    <Para tag="Sub"> 
     <Content_GB>string_5</Content_GB> 
     <Content_DE>German_translation of_string_5</Content_DE> 
    </Para> 
    <Para tag="L3"> 
     <Content_GB>string_6</Content_GB> 
     <Content_DE>German_translation of_string_6</Content_DE> 
    </Para> 
    <Para tag="Subbull"> 
     <Content_GB>string_7</Content_GB> 
     <Content_DE>German_translation of_string_7</Content_DE> 
    </Para> 
</Dictionary> 
</Book>

我使用的樣式表如下：

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
<xsl:output method="xml" version="1.0" xmlns="http://www.w3.org/1999/xhtml" encoding="UTF-8" indent="yes"/> 
<xsl:output omit-xml-declaration="yes" indent="yes"/> 
<xsl:template match="/"> 
    <xsl:copy> 
     <xsl:apply-templates select="@* | node()"/> 
    </xsl:copy> 
</xsl:template> 
<xsl:template match="@* | node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@* | node()"/> 
    </xsl:copy> 
</xsl:template> 
<xsl:template match="text()"> 
    <xsl:value-of select="normalize-space(.)"/> 
</xsl:template> 
<xsl:template match="Section"> 
    <!-- store reference tag list --> 
    <xsl:variable name="Ref_tagList" select="GB/Para/attribute()[1]"/> 
    <Dictionary> 
     <xsl:for-each select="GB/Para"> 
      <xsl:variable name="pos" select="position()"/> 
      <Para tag="{@tag_GB}"> 
       <!-- Copy English Master --> 
       <xsl:apply-templates select="element()[1]"/> 
       <xsl:for-each select="//Book/Section/element()[not(self::GB)]"> 
        <!-- store current locale tag list --> 
        <xsl:variable name="Curr_tagList" select="Para/attribute()[1]"/> 
        <xsl:if test="$Ref_tagList = $Curr_tagList"> 
         <!-- Copy current locale is current tag list equals reference tag list --> 
         <xsl:apply-templates select="Para[position()=$pos]/element()[1]"/> 
        </xsl:if> 
       </xsl:for-each> 
      </Para> 
     </xsl:for-each> 
    </Dictionary> 
</xsl:template> 
</xsl:stylesheet>

除了可能不是最有效的方式做到這個（我對xslt遊戲相當新穎......）它也不起作用。我想到的邏輯是採取英文大師的屬性集，如果任何其他語言環境的屬性集是相等的我複製，如果不是我忽略。但由於某些原因，具有不同屬性序列的節點集也被愉快地複製（如下所示）。有人能告訴我我的邏輯與現實矛盾嗎？提前致謝！

電流輸出包括丹麥是應該被忽略的......

<Book> 
<Dictionary> 
    <Para tag="L1"> 
     <Content_GB>string_1</Content_GB> 
     <Content_DE>German_translation of_string_1</Content_DE> 
     <Content_DK>Partial_Danish_translation_of_string_1</Content_DK> 
    </Para> 
    <Para tag="Illanc"> 
     <Content_GB>string_2</Content_GB> 
     <Content_DE>German_translation of_string_2</Content_DE> 
     <Content_DK>Partial_Danish_translation_of_string_1</Content_DK> 
    </Para> 
    <Para tag="|PLB"> 
     <Content_GB>string_3</Content_GB> 
     <Content_DE>German_translation of_string_3</Content_DE> 
     <Content_DK>Danish_translation_of_string_2</Content_DK> 
    </Para> 
    <Para tag="L1"> 
     <Content_GB>string_4</Content_GB> 
     <Content_DE>German_translation of_string_4</Content_DE> 
     <Content_DK>Danish_translation_of_string_4</Content_DK> 
    </Para> 
    <Para tag="Sub"> 
     <Content_GB>string_5</Content_GB> 
     <Content_DE>German_translation of_string_5</Content_DE> 
     <Content_DK>Danish_translation_of_string_3</Content_DK> 
    </Para> 
    <Para tag="L3"> 
     <Content_GB>string_6</Content_GB> 
     <Content_DE>German_translation of_string_6</Content_DE> 
     <Content_DK>Danish_translation_of_string_6</Content_DK> 
    </Para> 
    <Para tag="Subbull"> 
     <Content_GB>string_7</Content_GB> 
     <Content_DE>German_translation of_string_7</Content_DE> 
     <Content_DK>Danish_translation_of_string_5</Content_DK> 
    </Para> 
</Dictionary> 
</Book>

來源

2011-07-14 Wokoman

你只想組序列的人？或匹配的組？ – Treemonkey

完整的'部分'必須匹配。實際上，組中有更多的內容字符串，標籤中有很多變化。所以我們假設GB部分有50個段落，德國部分也應該有50個段落，屬性的順序完全相同。 – Wokoman

另外，如果有另一個與DK匹配的語言Para元素序列呢？應該使用哪種序列模式？或者你想要兩個字典元素（GB，DE）和（DK，XX）？ –

這可能不是最好的解決辦法。我已經使用了以下XSLT 2.0功能：

我使用string-join()比較了屬性的順序。
我利用

有可能可以解決您的問題更XSLT 2.0設施使用RTF變量的可能性。但我認爲這裏的BIG問題是您的輸入文檔。

對不起，沒有看看您當前的轉換。剛從頭開始實施。希望它有助於：

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output indent="yes"/> 
    <xsl:strip-space elements="*"/> 

    <xsl:template match="GB"> 
     <Book> 
      <Dictionary> 

       <xsl:variable name="matches"> 
        <xsl:for-each select="following-sibling::* 
         [string-join(Para/@*,'-') 
         = string-join(current()/Para/@*,'-')]"> 
         <match><xsl:copy-of select="Para/*"/></match> 
        </xsl:for-each> 
       </xsl:variable> 

       <xsl:apply-templates select="Para"> 
        <xsl:with-param name="matches" select="$matches"/> 
       </xsl:apply-templates> 

      </Dictionary> 
     </Book> 
    </xsl:template> 

    <xsl:template match="Para[parent::GB]"> 
     <xsl:param name="matches"/> 
     <xsl:variable name="pos" select="position()"/> 
     <Para tag="{@tag_GB}"> 
      <xsl:copy-of select="Content_GB"/> 
      <xsl:copy-of select="$matches/match/*[position()=$pos]"/> 
     </Para> 
    </xsl:template> 

    <xsl:template match="text()"/> 

</xsl:stylesheet>

當施加到在問題提供的輸入文件，下面的輸出中產生：

<Book> 
    <Dictionary> 
     <Para tag="L1"> 
     <Content_GB>string_1</Content_GB> 
     <Content_DE>German_translation of_string_1</Content_DE> 
     </Para> 
     <Para tag="Illanc"> 
     <Content_GB>string_2</Content_GB> 
     <Content_DE>German_translation of_string_2</Content_DE> 
     </Para> 
     <Para tag="|PLB"> 
     <Content_GB>string_3</Content_GB> 
     <Content_DE>German_translation of_string_3</Content_DE> 
     </Para> 
     <Para tag="L1"> 
     <Content_GB>string_4</Content_GB> 
     <Content_DE>German_translation of_string_4</Content_DE> 
     </Para> 
     <Para tag="Sub"> 
     <Content_GB>string_5</Content_GB> 
     <Content_DE>German_translation of_string_5</Content_DE> 
     </Para> 
     <Para tag="L3"> 
     <Content_GB>string_6</Content_GB> 
     <Content_DE>German_translation of_string_6</Content_DE> 
     </Para> 
     <Para tag="Subbull"> 
     <Content_GB>string_7</Content_GB> 
     <Content_DE>German_translation of_string_7</Content_DE> 
     </Para> 
    </Dictionary> 
</Book>

來源

2011-07-14 16:12:25

現在改進刪除一個迭代。 –

感謝Empo，真的很感謝努力，最後我用了Mads自適應，但方法非常相似。今天又學到了一些有價值的東西！ – Wokoman

該樣式表利用<xsl:for-each-group>

首先，基團的通過它們的Para/@*值的序列的元素
然後，對於這些序列中的每一個，將Para使用具有以「tag」開頭的屬性的following sibling個元素的數量。

我對@*的匹配項有謂詞過濾器，以確保它比較以「tag_」開頭的那些過濾器。這可能不是必需的，但是如果其他屬性添加到實例XML中，將有助於確保它仍然有效。

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" version="1.0" xmlns="http://www.w3.org/1999/xhtml" encoding="UTF-8" 
     indent="yes"/> 
    <xsl:output omit-xml-declaration="yes" indent="yes"/> 

    <xsl:template match="@* | node()"> 
     <xsl:copy> 
      <xsl:apply-templates select="@* | node()"/> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="text()" priority="1"> 
     <xsl:value-of select="normalize-space(.)"/> 
    </xsl:template> 

    <xsl:template match="Section"> 
     <xsl:for-each-group select="*" 
      group-adjacent="string-join(
      Para/@*[starts-with(local-name(),'tag_')],'|')"> 
      <Dictionary> 
       <xsl:for-each-group select="current-group()/Para" 
        group-by="count(
        following-sibling::*[@*[starts-with(local-name(),'tag_')]])"> 
        <Para tag="{(current-group()/@*[starts-with(local-name(),'tag_')])[1]}"> 
         <xsl:copy-of select="current-group()/*"/> 
        </Para> 
       </xsl:for-each-group> 
      </Dictionary> 
     </xsl:for-each-group> 
    </xsl:template> 

</xsl:stylesheet>

當應用到樣品輸入XML，產生以下的輸出：

<Book> 
    <Dictionary> 
     <Para tag="L1"> 
     <Content_GB>string_1</Content_GB> 
     <Content_DE>German_translation of_string_1</Content_DE> 
     </Para> 
     <Para tag="Illanc"> 
     <Content_GB>string_2</Content_GB> 
     <Content_DE>German_translation of_string_2</Content_DE> 
     </Para> 
     <Para tag="|PLB"> 
     <Content_GB>string_3</Content_GB> 
     <Content_DE>German_translation of_string_3</Content_DE> 
     </Para> 
     <Para tag="L1"> 
     <Content_GB>string_4</Content_GB> 
     <Content_DE>German_translation of_string_4</Content_DE> 
     </Para> 
     <Para tag="Sub"> 
     <Content_GB>string_5</Content_GB> 
     <Content_DE>German_translation of_string_5</Content_DE> 
     </Para> 
     <Para tag="L3"> 
     <Content_GB>string_6</Content_GB> 
     <Content_DE>German_translation of_string_6</Content_DE> 
     </Para> 
     <Para tag="Subbull"> 
     <Content_GB>string_7</Content_GB> 
     <Content_DE>German_translation of_string_7</Content_DE> 
     </Para> 
    </Dictionary> 
    <Dictionary> 
     <Para tag="L1"> 
     <Content_DK>Partial_Danish_translation_of_string_1</Content_DK> 
     </Para> 
     <Para tag="L1_sub"> 
     <Content_DK>Partial_Danish_translation_of_string_1</Content_DK> 
     </Para> 
     <Para tag="Illanc"> 
     <Content_DK>Danish_translation_of_string_2</Content_DK> 
     </Para> 
     <Para tag="L1"> 
     <Content_DK>Danish_translation_of_string_4</Content_DK> 
     </Para> 
     <Para tag="|PLB"> 
     <Content_DK>Danish_translation_of_string_3</Content_DK> 
     </Para> 
     <Para tag="L3"> 
     <Content_DK>Danish_translation_of_string_6</Content_DK> 
     </Para> 
     <Para tag="Sub"> 
     <Content_DK>Danish_translation_of_string_5</Content_DK> 
     </Para> 
     <Para tag="Subbull"> 
     <Content_DK>Danish_translation_of_string_7</Content_DK> 
     </Para> 
    </Dictionary> 
</Book>

來源

2011-07-14 17:41:30

謝謝Mads，像夢一樣工作。你甚至預料到，事實上，我的來源確實有不止一個屬性:-) – Wokoman

+1，用於'xsl：for-each-group'的高級使用，即使我發現它不是很直觀。 –

比較基於屬性序列的2個節點集

回答

相關問題