2017-02-09 67 views
0

我正在嘗試使用表情符號的html代碼或十六進制代碼使用Java將帶有表情符號內容的文本文件轉換爲文件。 例如:將表情符號轉換爲HTML十進制代碼或Unicode十六進制代碼在java中

I/P:<div id="thread" style="white-space: pre-wrap;"><div>⚽️

預期O/P:<div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀

在上述出放''應該得到改變到相應的HTML實體代碼'& # 128512;'

詳細的Html實體代碼和十六進制代碼在這裏給出: http://character-code.com/emoticons-html-codes.php

示例代碼我試着低於:

try { 
      File file = new File("/inFile.txt"); 
      str = FileUtils.readFileToString(file, "ISO-8859-1"); 
      System.out.println(new String(str.getBytes(), "UTF-8")); 
      String results = StringEscapeUtils.escapeHtml4(str); 
      System.out.println(results); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 
+1

所以你的代碼做一些事情,你不告訴我們的代碼,然後問爲什麼代碼不能正常工作? *真的嗎?!?!?* – Andreas

+0

添加了我試過的示例代碼。 –

+1

你確定該文件使用ISO-8859-1編碼嗎?這似乎......不太可能。 – dnault

回答

0
I got the work around : 
public static void htmlDecimalCodeGenerator() { 

    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); 

    domFactory.setValidating(false); 

    // File inputFile = new File("/inputFile.xml"); 
    File inputFile = new File("/inputFile.xml"); 



    try { 

    FileOutputStream fop = null; 

    File OutFile = new File("/outputFile.xml"); 

    fop = new FileOutputStream(OutFile); 



    DocumentBuilder builder = domFactory.newDocumentBuilder(); 

    Document doc = builder.parse(inputFile); 



    TransformerFactory tf = TransformerFactory.newInstance(); 

    Transformer transformer = tf.newTransformer(); 



    /* 
    no value of OMIT_XML_DECLARATION will add following xml declaration in the beginning of the file. 
    <?xml version='1.0' encoding='UTF-32'?> 
    */ 
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); 



    /* 

    When the output method is "xml", the version value specifies the 
    version of XML to be used for outputting the result tree. The default 
    value for the xml output method is 1.0. When the output method is 
    "html", the version value indicates the version of the HTML. 
    The default value for the xml output method is 4.0, which specifies 
    that the result should be output as HTML conforming to the HTML 4.0 
    Recommendation [HTML]. If the output method is "text", the version 
    property is ignored 
    */ 
    transformer.setOutputProperty(OutputKeys.METHOD, "xml"); 



    /* 
    Indent-- specifies whether the Transformer may 
    add additional whitespace when outputting the result tree; the value 
    must be yes or no. 
    */ 
    transformer.setOutputProperty(OutputKeys.INDENT, "no"); 





    transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1"); 

    // transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4"); 

    transformer.transform(new DOMSource(doc), 

    new StreamResult(new OutputStreamWriter(System.out, "UTF-8"))); 

    // new StreamResult(new OutputStreamWriter(fop, "UTF-8"))); 


    } catch (Exception e) { 

    e.printStackTrace(); 

    } 

} 

} 
相關問題