2017-04-14 167 views
0

我正在使用XML轉換器將XML轉換爲另一種XML。有些沒有英文字符轉換失敗。XML轉換失敗

原始的XML:

<?xml version="1.0" encoding="UTF-8"?> 
<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0"> 
    <RR_KeyPersonExpanded_2_0:KeyPerson> 
     <RR_KeyPersonExpanded_2_0:Profile> 
     <RR_KeyPersonExpanded_2_0:Name> 
      <globLib:PrefixName>候.</globLib:PrefixName> 
      <globLib:FirstName>Lakshmi</globLib:FirstName> 
      <globLib:MiddleName>AB</globLib:MiddleName> 
      <globLib:LastName>Sørensen</globLib:LastName> 
     </RR_KeyPersonExpanded_2_0:Name> 
     </RR_KeyPersonExpanded_2_0:Profile> 
    </RR_KeyPersonExpanded_2_0:KeyPerson> 
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0> 

removeemptytags.xsl:

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
<xsl:strip-space elements="*"/> 
<xsl:output indent="yes" omit-xml-declaration="yes" encoding="UTF-8" method="xml"/> 
<xsl:template match="@*|node()"> 
    <xsl:copy> 
    <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
</xsl:template> 

<xsl:template match="*[not(descendant-or-self::*[text()[normalize-space()] | @*])]"/> 

</xsl:stylesheet> 

Java代碼:

public String removeEmptyTags(String xml) { 
    String filteredXML = ""; 
    try (OutputStream bos = new ByteArrayOutputStream();) { 
     TransformerFactory transformerFactory = TransformerFactory.newInstance(); 
     StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8"))); 
     StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl")); 
     Transformer transformer = transformerFactory.newTransformer(xsltSource); 

     StreamResult result = new StreamResult(bos); 
     transformer.transform(inputXMLSource, result); 
     bos.flush(); 
     filteredXML = bos.toString(); 
    } catch (Exception e) { 
     logger.log(Level.SEVERE, "Exception while removing empty tags : ", e); 
     throw new ParsingException(e.getMessage()); 
    } 
    return filteredXML; 
} 

輸出中的xml:

<RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0 xmlns:RR_KeyPersonExpanded_2_0="http://apply.grants.gov/forms/RR_KeyPersonExpanded_2_0-V2.0" xmlns:att="http://apply.grants.gov/system/Attachments-V1.0" xmlns:glob="http://apply.grants.gov/system/Global-V1.0" xmlns:globLib="http://apply.grants.gov/system/GlobalLibrary-V2.0" RR_KeyPersonExpanded_2_0:FormVersion="2.0"> 
<RR_KeyPersonExpanded_2_0:KeyPerson> 
<RR_KeyPersonExpanded_2_0:Profile> 
<RR_KeyPersonExpanded_2_0:Name> 
<globLib:PrefixName>候.</globLib:PrefixName> 
<globLib:FirstName>Lakshmi</globLib:FirstName> 
<globLib:MiddleName>AB</globLib:MiddleName> 
<globLib:LastName>Sørensen</globLib:LastName> 
</RR_KeyPersonExpanded_2_0:Name> 
</RR_KeyPersonExpanded_2_0:Profile> 
</RR_KeyPersonExpanded_2_0:KeyPerson> 
</RR_KeyPersonExpanded_2_0:RR_KeyPersonExpanded_2_0> 

正如你所看到的,「非英語單詞」只是成爲一羣無意義的人物。我嘗試將xslt中的編碼更改爲「UTF-16」,但它不起作用。有人在這裏遇到同樣的問題嗎?

+1

你輸出的編碼設置爲UTF-8? – Compass

回答

2

要得到那麼多奇怪的字符,你似乎有多個編碼問題。

首先,當讀取XML到xml字符串(代碼未顯示)。由於我們不知道你是怎麼做錯了,儘管你可能忘記了指定UTF-8編碼,但是對於這個不是很有幫助。

二,當致電bos.toString()。如果要將結果設置爲String,請勿使用OutputStream。使用StringWriter(請參閱下面的代碼)。

三,將字符串寫入文件(代碼未顯示)。再次,不能真正幫助這個,因爲我們不知道你是怎麼做的,儘管你可能忘了指定UTF-8編碼。

public String removeEmptyTags(String xml) { 
    try (StringWriter out = new StringWriter()) { 
     TransformerFactory transformerFactory = TransformerFactory.newInstance(); 
     StreamSource inputXMLSource = new StreamSource(new ByteArrayInputStream(xml.getBytes("UTF-8"))); 
     StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl")); 
     Transformer transformer = transformerFactory.newTransformer(xsltSource); 

     transformer.transform(inputXMLSource, new StreamResult(out)); 
     return out.toString(); 
    } catch (Exception e) { 
     logger.log(Level.SEVERE, "Exception while removing empty tags : ", e); 
     throw new ParsingException(e.getMessage()); 
    } 
} 

其實,倒不如直接從/對文件做這一切,並讓XML庫弄清楚編碼:

public void removeEmptyTags(Path inFile, Path outFile) { 
    try (InputStream in = Files.newInputStream(inFile); 
     OutputStream out = Files.newOutputStream(outFile)) { 
     TransformerFactory transformerFactory = TransformerFactory.newInstance(); 
     StreamSource xsltSource = new StreamSource(getClass().getClassLoader().getResourceAsStream("removeemptytags.xsl")); 
     Transformer transformer = transformerFactory.newTransformer(xsltSource); 

     transformer.transform(new StreamSource(in), new StreamResult(out)); 
    } catch (Exception e) { 
     logger.log(Level.SEVERE, "Exception while removing empty tags : ", e); 
     throw new ParsingException(e.getMessage()); 
    } 
} 
+0

你是對的!!我做了多次編碼。在輸出中,我需要結果作爲一個字符串。我只是簡單地使用byte [] b = StringUtils.toBytesUTF8(filteredXML)來進行編碼。 –