2014-02-23 78 views
1

我一直在試圖將一個UTF-8字符串轉換爲它在ISO-8859-1中的相對位置,以便在XML文檔中輸出它,我嘗試的是,輸出總是被錯誤地顯示出來。從UTF-8到ISO-8859-1到XML文件的Java編碼

爲了簡化問題,我創建了一個包含我所做的所有測試的代碼片段,並在此之後複製/粘貼生成的文檔。

您也可以確定我嘗試了新的String(xxx.getBytes("UTF-8"), "ISO-8859-1")之間的所有可能的組合,通過切換UTF & ISO,有時也通過設置相同的值。沒有用!

這裏的片段:

// @see http://stackoverflow.com/questions/229015/encoding-conversion-in-java 
private static String changeEncoding(String input) throws Exception { 
    // Create the encoder and decoder for ISO-8859-1 
    Charset charset = Charset.forName("ISO-8859-1"); 
    CharsetDecoder decoder = charset.newDecoder(); 
    CharsetEncoder encoder = charset.newEncoder(); 

    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer 
    // The new ByteBuffer is ready to be read. 
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(input)); 

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string. 
    // The new ByteBuffer is ready to be read. 
    CharBuffer cbuf = decoder.decode(bbuf); 
    return cbuf.toString(); 
} 

// @see http://stackoverflow.com/questions/655891/converting-utf-8-to-iso-8859-1-in-java-how-to-keep-it-as-single-byte 
private static String byteEncoding(String input) throws Exception { 
    Charset utf8charset = Charset.forName("UTF-8"); 
    Charset iso88591charset = Charset.forName("ISO-8859-1"); 

    ByteBuffer inputBuffer = ByteBuffer.wrap(input.getBytes()); 

    // decode UTF-8 
    CharBuffer data = utf8charset.decode(inputBuffer); 

    // encode ISO-8559-1 
    ByteBuffer outputBuffer = iso88591charset.encode(data); 
    byte[] outputData = outputBuffer.array(); 
    return new String(outputData, "ISO-8859-1"); 
} 

public static Result home() throws Exception { 
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); 
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder(); 

    //root elements 
    Document doc = docBuilder.newDocument(); 
    doc.setXmlVersion("1.0"); 
    doc.setXmlStandalone(true); 

    Element rootElement = doc.createElement("test"); 
    doc.appendChild(rootElement); 

    rootElement.setAttribute("original", "héllo"); 

    rootElement.setAttribute("stringToString", new String("héllo".getBytes("UTF-8"), "ISO-8859-1")); 

    rootElement.setAttribute("stringToBytes", changeEncoding("héllo")); 

    rootElement.setAttribute("stringToBytes2", byteEncoding("héllo")); 

    TransformerFactory tf = TransformerFactory.newInstance(); 
    Transformer transformer = tf.newTransformer(); 
    transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1"); 

    StringWriter writer = new StringWriter(); 
    transformer.transform(new DOMSource(doc), new StreamResult(writer)); 
    String output = writer.getBuffer().toString().replaceAll("\n|\r", ""); 

    // The following is Play!Framework specifics for rendering an url, but I believe this is not the problem (I checked in the developer console, the document is correctly in "ISO-8859-1" 
    response().setHeader("Content-Type", "text/xml; charset=ISO-8859-1"); 
    return ok(output).as("text/xml"); 
} 

而結果:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<test original="héllo" stringToBytes="héllo" stringToBytes2="héllo" stringToString="héllo"/> 

我如何進行?

+0

我想你拼寫錯誤'response'。如果你談論Play!Frameowork中的'response()',那麼就沒有'setCharacterEncoding()'(我正在使用Play!2.1.5)。文檔中沒有'setCharacterEncoding()'(文檔) –

+0

感謝您的幫助。我已通過調用'setHeader'將編碼設置爲「ISO-8859-1」。在Play v2.1.5中沒有'encoding'(但是有CONTENT_ENCODING,這是最後的) –

+0

對不起。我讀了1.2.5而不是2.1.5。 –

回答

2

由於我無法解釋的原因,通過寫入文件並將此文件返回給輸出解決了編碼問題。

我決定保留這個問題以防其他人有類似的問題。

這裏的片段:

TransformerFactory tf = TransformerFactory.newInstance(); 
Transformer transformer = tf.newTransformer(); 
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1"); 

File file = new File("Path/to/file.xml"); 
transformer.transform(new DOMSource(doc), new StreamResult(file)); 

response().setHeader("Content-Disposition", "attachment;filename=" + file.getName()); 
response().setHeader("Content-Type", "text/xml; charset=ISO-8859-1"); 
return ok(file).as("text/xml");