2013-01-09 108 views
2

對XSLT進行了幾個小時的研究後,我承認失敗了!我需要修復大量的.xlf XLIFF翻譯文件,這些文件從一個未命名的翻譯工具中返回給我們。理想情況下,我會使用批處理工具將XSL轉換應用於它們。XSLT:將兄弟文本節點移動到選定節點進行XLIFF修復

下面是XLIFF文件中的一個片段:

<body> 
    <trans-unit id="1" phase-name="pretrans" restype="x-h3"> 
     <source>Adding, Deleting or Modifying Notes in the Call Description</source> 
     <seg-source>Adding, Deleting or Modifying Notes in the Call Description</seg-source> 
     <target state="final">Добавление, удаление и изменение примечаний в описании звонка</target> 
    </trans-unit> 
    <trans-unit id="2" phase-name="pretrans" restype="x-p"> 
     <source>Description of Fields on RHS</source> 
     <seg-source>Description of Fields on RHS</seg-source> 
     <target state="final">Поле описания в правой части</target> 
    </trans-unit> 
    <trans-unit id="3" phase-name="pretrans" restype="x-p"> 
     <source>You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so. These notes are visible to all users who have access to the call recording. It is recommended that each user add their initials to the notes to avoid potential confusion.</source> 
     <seg-source> 
      <mrk mtype="seg" mid="1">You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so.</mrk> 
      <mrk mtype="seg" mid="2">These notes are visible to all users who have access to the call recording.</mrk> 
      <mrk mtype="seg" mid="3">It is recommended that each user add their initials to the notes to avoid potential confusion.</mrk> 
     </seg-source> 
     <target state="final"> 
      <mrk mtype="seg" mid="1" /><ph ctype="" id="1">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> позволяет находить телефонные взаимодействия, содержащие или не содержащие определенные фразы. 
      <mrk mtype="seg" mid="2" />Каждая речевая метка содержит одну или несколько таких фраз. 
      <mrk mtype="seg" mid="3" />Ядро <ph ctype="" id="3">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> индексирует медиафайлы и помечает места вхождения фразы (добавляет к ним метки). 
      <mrk mtype="seg" mid="4" />Затем нужные медиафайлы можно искать по связанным с ними меткам. 
     </target> 
    </trans-unit> 
    <trans-unit id="4" phase-name="pretrans" restype="x-p"> 
     <source>To add, delete, or modify text in the description field, click inside the description field.</source> 
     <seg-source>To add, delete, or modify text in the description field, click inside the description field.</seg-source> 
     <target state="final">Чтобы добавить, удалить или изменить текст в поле описания, щелкните это поле.</target> 
    </trans-unit> 
</body> 

通知的target標籤在第三trans-unit節點。 mrk標籤應該包含現在已經成爲兄弟姐妹的文本節點(與之前的seg-source標籤相比,這仍然是正確的),從而搞亂了結構。

所以我試圖找出不包含文本節點,然後將以下文本節點返回到他們的任何mrk標籤。

這裏是理想的結果:

<body> 
    <trans-unit id="1" phase-name="pretrans" restype="x-h3"> 
     <source>Adding, Deleting or Modifying Notes in the Call Description</source> 
     <seg-source>Adding, Deleting or Modifying Notes in the Call Description</seg-source> 
     <target state="final">Добавление, удаление и изменение примечаний в описании звонка</target> 
    </trans-unit> 
    <trans-unit id="2" phase-name="pretrans" restype="x-p"> 
     <source>Description of Fields on RHS</source> 
     <seg-source>Description of Fields on RHS</seg-source> 
     <target state="final">Поле описания в правой части</target> 
    </trans-unit> 
    <trans-unit id="3" phase-name="pretrans" restype="x-p"> 
     <source>You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so. These notes are visible to all users who have access to the call recording. It is recommended that each user add their initials to the notes to avoid potential confusion.</source> 
     <seg-source> 
      <mrk mtype="seg" mid="1">You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so.</mrk> 
      <mrk mtype="seg" mid="2">These notes are visible to all users who have access to the call recording.</mrk> 
      <mrk mtype="seg" mid="3">It is recommended that each user add their initials to the notes to avoid potential confusion.</mrk> 
     </seg-source> 
     <target state="final"> 
      <mrk mtype="seg" mid="1"><ph ctype="" id="1">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> позволяет находить телефонные взаимодействия, содержащие или не содержащие определенные фразы.</mrk> 
      <mrk mtype="seg" mid="2">Каждая речевая метка содержит одну или несколько таких фраз.</mrk> 
      <mrk mtype="seg" mid="3">Ядро <ph ctype="" id="3">&lt;MadCap:variable name="zoom_userdocs_variables.var_product_name" xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" /&gt;</ph> индексирует медиафайлы и помечает места вхождения фразы (добавляет к ним метки).</mrk> 
      <mrk mtype="seg" mid="4">Затем нужные медиафайлы можно искать по связанным с ними меткам.</mrk> 
     </target> 
    </trans-unit> 
    <trans-unit id="4" phase-name="pretrans" restype="x-p"> 
     <source>To add, delete, or modify text in the description field, click inside the description field.</source> 
     <seg-source>To add, delete, or modify text in the description field, click inside the description field.</seg-source> 
     <target state="final">Чтобы добавить, удалить или изменить текст в поле описания, щелкните это поле.</target> 
    </trans-unit> 
</body> 

我通常會做這在Perl用的libxml或相似,但我敢肯定,這是一個XSLT簡單的任務。我尋找了一個類似的解決方案,但找不到任何我能做的工作。其他

要注意的一點 - 雖然「漂亮打印」在這裏,最終body節點定義是全部在一行。

謝謝!我期待着學習新的東西!

編輯:更新源的上方到內<target>元件,其必須被保留顯示進一步的子標記。編輯2:添加了所需的結果。

回答

2

試試這個XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" indent="yes"/> 

    <xsl:template match="@* | node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@* | node()"/> 
    </xsl:copy> 
    </xsl:template> 

    <xsl:template match="trans-unit/target/mrk[following-sibling::text()]"> 
    <xsl:copy> 
     <xsl:apply-templates select="@* | node()"/> 
     <xsl:value-of select="following-sibling::text()"/> 
    </xsl:copy> 
    </xsl:template> 

    <xsl:template match="trans-unit/target/text()"/> 

</xsl:stylesheet> 

也許它會產生期望的結果:

<body> 
    <trans-unit id="1" phase-name="pretrans" restype="x-h3"> 
     <source>Adding, Deleting or Modifying Notes in the Call Description</source> 
     <seg-source>Adding, Deleting or Modifying Notes in the Call Description</seg-source> 
     <target state="final" /> 
    </trans-unit> 
    <trans-unit id="2" phase-name="pretrans" restype="x-p"> 
     <source>Description of Fields on RHS</source> 
     <seg-source>Description of Fields on RHS</seg-source> 
     <target state="final" /> 
    </trans-unit> 
    <trans-unit id="3" phase-name="pretrans" restype="x-p"> 
     <source>You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so. These notes are visible to all users who have access to the call recording. It is recommended that each user add their initials to the notes to avoid potential confusion.</source> 
     <seg-source> 
      <mrk mtype="seg" mid="1">You can add descriptive text notes to a call recording, if you have the appropriate privileges to do so.</mrk> 
      <mrk mtype="seg" mid="2">These notes are visible to all users who have access to the call recording.</mrk> 
      <mrk mtype="seg" mid="3">It is recommended that each user add their initials to the notes to avoid potential confusion.</mrk> 
     </seg-source> 
     <target state="final"><mrk mtype="seg" mid="1">При наличии соответствующих прав можно добавить описательные текстовые примечания к записи звонка. 
      </mrk><mrk mtype="seg" mid="2">Эти примечания видны для всех пользователей, которые имеют доступ к записи звонка. 
      </mrk><mrk mtype="seg" mid="3">Во избежание возможной путаницы каждому пользователю рекомендуется к примечаниям добавлять свои инициалы. 
     </mrk></target> 
    </trans-unit> 
    <trans-unit id="4" phase-name="pretrans" restype="x-p"> 
     <source>To add, delete, or modify text in the description field, click inside the description field.</source> 
     <seg-source>To add, delete, or modify text in the description field, click inside the description field.</seg-source> 
     <target state="final" /> 
    </trans-unit> 
</body> 
+0

你有沒有測試過這個?它將工作,除了輸出將包含額外的空白,包括換行符,在文本之後,這可能是不可接受的。此外,任何_correctly_結構化的'target/mrk'標籤也將通過將'mrk'空白字符移動到文本節點進行修改。明智地使用「規範化空間」應該解決這個問題。 –

+0

@JimGarrison,我提供了確切的輸出。無論如何,OP可以使用'normalize-space'功能。 –

+0

@JimGarrison,它不會修改正確的'target/mrk'。只有那些包含以下兄弟文本節點的人。 –