使用XSLT保留特殊字符作爲輸出

我面臨的問題是關於甚至在XSLT轉換後保留特殊字符。我的源XHTML文件包含幾個特殊字符，如 ,—,’;它在XSLT轉換時被忽略。使用XSLT保留特殊字符作爲輸出

我嘗試了各種答案，如this和this。

如果手動將特殊字符的值更改爲其相應的Unicode表示形式，則字符將保留在輸出中。

例如，將 更改爲 ，它會在輸出中產生空間。請參考下面的一些示例文件：

來源XHTML：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:text="http://giraffe.wkle.com/text" xmlns:epub="http://www.idpf.org/2007/ops"> 
    <body> 
     <div class="section" id="section_1"> 
      <p id="para_1" class="para">Content&nbsp;of&nbsp;paragraph&mdash;1.</p> 
      <p id="para_2" class="para">Content&nbsp;of&nbsp;paragraph&mdash;2.</p> 
     </div> 
    </body> 
</html>

XSL模板：

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> 

    <xsl:output method="xml" indent="yes"/> 
    <xsl:template match="node()|@*" name="identity"> 
     <xsl:copy> 
      <xsl:apply-templates select="node()|@*" /> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="*[local-name()='p']/text()"> 
     <xsl:copy-of select="."/> 
    </xsl:template> 
</xsl:stylesheet>

預期輸出：

<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:text="http://giraffe.wkle.com/text"> 
    <body> 
     <div class="section" id="section_1"> 
      <p class="para" id="para_1">Content of paragraph—1.</p> 
      <p class="para" id="para_2">Content of paragraph—2.</p> 
     </div> 
    </body> 
</html>

實際輸出：

<?xml version="1.0" encoding="UTF-8"?> 
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:text="http://giraffe.wkle.com/text"> 
    <body> 
     <div class="section" id="section_1"> 
      <p class="para" id="para_1">Contentofparagraph1.</p> 
      <p class="para" id="para_2">Contentofparagraph2.</p> 
     </div> 
    </body> 
</html>

限制：

我沒有獲得修改源XHTML內容或其DTD。
XSLT的版本是1.0。

請讓我知道是否有任何方法可以使用它們的Unicode值轉換特殊字符並將它們保留在我的輸出XML文檔中。

更新：

我使用這段Java代碼轉換調用：

public class XSLTUtil { 

    public static String processXHTML(String sourceFileName, String outputXhtml, String xslFilePath) throws ParserConfigurationException, SAXException, IOException, TransformerException { 
     DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
     DocumentBuilder docbuilder = factory.newDocumentBuilder(); 
     Document doc = docbuilder.parse(new FileInputStream(sourceFileName)); 

     FileOutputStream fos = null; 
     FileInputStream fis = null; 
     try { 
      fos = new FileOutputStream(outputXhtml); 
      fis = new FileInputStream(xslFilePath); 
      TransformerFactory transformfactory = TransformerFactory.newInstance(); 
      Templates xsl = transformfactory.newTemplates(new StreamSource(fis)); 
      Transformer transformer = xsl.newTransformer(); 
      transformer.transform(new DOMSource(doc.getDocumentElement()),new StreamResult(fos)); 
      return outputXhtml; 
     } finally { 
      if(fos != null) { 
       fos.close(); 
      } 
      if(fis != null) { 
       fis.close(); 
      } 
     } 
    } 

    public static void main(String args[]){ 
     String sourceFileName = "C:\\source.xhtml"; 
     String outputXhtml = "C:\\output.xhtml"; 
     String xslFilePath = "C:\\xslTemplate.xsl"; 
     String result = "-1"; 
     try { 
      result = processXHTML(sourceFileName, outputXhtml, xslFilePath); 
     } catch (ParserConfigurationException e) { 
      e.printStackTrace(); 
     } catch (SAXException e) { 
      e.printStackTrace(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } catch (TransformerException e) { 
      e.printStackTrace(); 
     } 
     System.out.println("Result : "+ result); 
    } 
}

來源

2017-09-13 Subhashree Pradhan

您在哪個平臺上使用哪個XSLT處理器，您究竟如何運行轉換？ –

我正在使用XSLT1.0，Apache Xalan處理器。我正在使用Java調用轉換。請檢查上面的更新。 –

與解決方案Apache Commons Lang3 StringEscapUtils。孤男寡女庫：

<dependency> 
     <groupId>org.apache.commons</groupId> 
     <artifactId>commons-lang3</artifactId> 
     <version>3.0</version> 
    </dependency>

第一閱讀內容並更換所有實體真實文本。

public static String processXHTML(String sourceFileName, String outputXhtml, 
     String xslFilePath) throws ParserConfigurationException, SAXException, IOException, 
     TransformerException { 

    Charset charset = StandardCharsets.UTF_8; 
    Path path = Paths.get(sourceFileName); 
    String source = new String(Files.readAllBytes(path), charset); 
    source = source.replaceAll("\\&(amp|lt|gt|quot);", "\u0001$1;"); 
    source = StringEscapeUtils.unescapeHtml4(source); 
    source = source.replace('\u0001', '&'); 
    byte[] bytes = source.getBytes(charset); 
    ByteArrayInputStream bais = new ByteArrayInputStream(bytes); 

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 
    DocumentBuilder docbuilder = factory.newDocumentBuilder(); 
    docbuilder.setEntityResolver(new EntityResolver() { 
     @Override 
     public InputSource resolveEntity(String publicId, String systemId) 
        throws SAXException, IOException { 
      System.out.printf("resolveEntity PUBLIC %s SYSTEM %s%n", publicId, systemId); 
      return new InputSource(new StringReader("")); 
     } 
    }); 
    //Document doc = docbuilder.parse(new FileInputStream(sourceFileName)); 
    Document doc = docbuilder.parse(bais);

我跳過了XML實體。

由於讀取和處理實體（通過網絡）需要很長時間，因此我有短接EntityResolver。看起來你已經安裝了類似的東西，因爲實體沒有被替換。

在resolveEntity中返回null以查看遞歸加載的DTD。

或者，您可以安裝XML目錄，這是HTML實體DTD的本地緩存，其中有一些。

來源

2017-09-14 14:57:51

謝謝你，喬普！這個答案對我有用。現在我可以保留大部分特殊字符。你能否詳細解釋答案，因爲我是XSLT的新手，並且很想知道我們在做什麼，特別是在這裏 - 'source = source.replaceAll（「\\＆amp; amp | lt | gt | 「）;」，「\ u0001 $ 1;」）;'？ –

XML使用實體'＆，<, >，」「'（'&，<，>，"，''）時，他們不是XML語法的一部分。你不想通過unescapeHtml填補這些實體這將因爲它們也是HTML實體，所以我在調用unescapeHtml之前用一個未使用的字符U + 0001（也就是Ctrl-A）替換＆符號，之後它必須被撤消。 –

[過時問題的答案的早期版本]

考慮使用安德魯韋爾奇的Lexev工具。它基本上預處理XML以將實體引用轉換爲將通過XML解析保留的內容，然後後處理轉換結果以將實體引用放回。

http://andrewjwelch.com/lexev/

來源

2017-09-13 16:20:44

謝謝你的時間，邁克爾。該實用程序使用SAXON解析器並從Java調用它。我的規格是不同的。（我已經更新了我的問題。）你有任何方法只使用XSLT，這可以幫助解決我這個問題？ –

我會擺脫DOM代碼。當你想要的時候創建一個DOM就是把它轉換成別的東西，這是笨拙而低效的。如果您提供StreamSource或SAXSource並讓他們決定自己的樹形表示，則撒克遜和Xalan運行得更快。使用撒克遜時，速度可以提高5-10倍，並且會使用更少的內存。

我不知道爲什麼DOM丟失實體引用。由於DOM數據模型與XSLT/XPath數據模型不同（特別是在處理實體擴展時），所以您已經給出了有關您正在使用哪個XSLT處理器的衝突信息，因此調查起來並不容易解決這個問題。

來源

2017-09-14 13:54:22

使用XSLT保留特殊字符作爲輸出

回答

相關問題