2012-12-04 31 views
0

簡短問題:如何處理xml輸入文件中的原始&符。URL查詢xml和xpath變換

添加:我甚至沒有選擇與&字段。解析器在文件中存在和號時抱怨。

長時間解釋: im處理通過url響應生成的xml。

<NOTE>I%20hope%20this%20won%27t%20require%20a%20signature%3f%20%20 
There%20should%20be%20painters%20%26%20stone%20guys%20at%20the 
%20house%20on%20Wednesday%2c%20but%20depending%20on%20what%20time%20 
it%20is%20delivered%2c%20I%20can%27t%20guarantee%21%20%20 
Also%2c%20just%20want%20to%20make%20sure%20the%20billing%20address 
%20is%20different%20from%20shipping%20address%3f 
</NOTE> 

這是URL解碼成這樣:

<NOTE>I hope this won't require a signature? 
There should be painters & stone guys at the 
house on Wednesday, but depending on what time it is delivered, I can't guarantee! 
Also, just want to make sure the billing address is different from shipping address? 
</NOTE> 

問題: xslproc扼流圈,因爲在「畫家&石佬」 與以下錯誤「&」,去年字符串:

xmlParseEntityRef: no name 
<NOTE>I hope this won't require a signature? There should be painters & 

它看起來像xsltproc預計關閉</NOTE>

我試過各種位置的所有方式disable-output-escaping="yes"xsl:outputxsl:value-of

而且還嘗試xsltproc --decode-uri,但無法弄清楚一個。沒有文件。

注意: 我不知道它是否值得以urlencoded格式保持輸入。並使用DOCTYPE ..如下所示(不知道如何做到這一點)。輸出最終是一個瀏覽器。

<!DOCTYPE xsl:stylesheet [ 
    <!ENTITY nbsp "&#160;"> 
    <!ENTITY copy "&#169;"> 
    <!ENTITY reg "&#174;"> 
]> 
+0

並且還嘗試了encoding =「UTF-8」和encoding =「ISO-8859-1」。其他有效的字符串? –

回答

0

如果存在未轉義的&符號,則XML格式錯誤。如果你把字符串放在<![CDATA[]]>之內,那麼它應該可以工作。

<NOTE><![CDATA[I hope this won't require a signature? 
    There should be painters & stone guys at the 
    house on Wednesday, but depending on what time it is delivered, I can't guarantee! 
    Also, just want to make sure the billing address is different from shipping address?]]> 
</NOTE> 

或者,當然,使用&amp;代替&

編輯:您也可以翻譯網址逃逸到數字字符引用如果XSLT處理器支持禁用輸出轉義(和xsltproc的一樣):

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="1.0"> 

    <xsl:template match="@*|node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
    </xsl:template> 

    <xsl:template match="NOTE"> 
    <xsl:copy> 
     <xsl:call-template name="decodeURL"/> 
    </xsl:copy> 
    </xsl:template> 

    <xsl:template name="decodeURL"> 
    <xsl:param name="URL" select="string()"/> 
    <xsl:choose> 
     <xsl:when test="contains($URL,'%')"> 
     <xsl:variable name="remainingURL" select="substring-after($URL,'%')"/> 
     <xsl:value-of disable-output-escaping="yes" select="concat(
      substring-before($URL,'%'), 
      '&amp;#x', 
      substring($remainingURL,1,2), 
      ';')"/> 
     <xsl:call-template name="decodeURL"> 
      <xsl:with-param name="URL" select="substring($remainingURL,3)"/> 
     </xsl:call-template> 
     </xsl:when> 
     <xsl:otherwise> 
     <xsl:value-of select="$URL"/> 
     </xsl:otherwise> 
    </xsl:choose> 
    </xsl:template> 

</xsl:stylesheet> 

當然你不必到使用此轉換作爲預處理步驟,您可以在樣式表中重新使用decodeURL,該樣式表將包含URL編碼字符串的源XML轉換爲HTML或其他格式。

+0

這就是我試圖做的事情。輸入是%26。 –

+0

和令人反感的字符串是第三方。所以要麼用xsl解析它,要麼可能是sed,awk或perl。 –

+0

不需要sed/awk/perl:查看我的更新答案。 –