StringEscapeUtils.unescapeHtml對從文件中讀取的字符串不起作用

我試圖讀取包含unicode字符的文件，將這些字符轉換爲其相應的符號，然後將結果文本打印到新文件中。我正在嘗試使用StringEscapeUtils.unescapeHtml來完成此操作，但這些行只是按原樣打印的，unicode點仍然完好無損。我做了一個練習，從文件中複製一行，從中創建一個字符串，然後調用StringEscapeUtils.unescapeHtml，這非常完美。我的代碼如下：StringEscapeUtils.unescapeHtml對從文件中讀取的字符串不起作用

class FileWrite 
{ 
public static void main(String args[]) 
    { 
    try{ 
     String testString = " \"text\":\"Dude With Knit Hat At Party Calls Beer \u2018Libations\u2019 http://t.co/rop8NSnRFu\" "; 

     FileReader instream = new FileReader("Home Timeline.txt"); 
     BufferedReader b = new BufferedReader(instream); 

     FileWriter fstream = new FileWriter("out.txt"); 
     BufferedWriter out = new BufferedWriter(fstream); 

     out.write(StringEscapeUtils.unescapeHtml3(testString) + "\n");//This gives the desired output, 
                    //with unicode points converted 
     String line = b.readLine().toString(); 

     while(line != null){ 
     out.write(StringEscapeUtils.unescapeHtml3(line) + "\n"); 
     line = b.readLine(); 
     } 

     //Close the output streams 
     b.close(); 
     out.close(); 
    } 
    catch (Exception e){//Catch exception if any 
    System.err.println("Error: " + e.getMessage()); 
    } 
    } 
}

來源

2013-05-14 Aonghus McGovern

//This gives the desired output, 
//with unicode points converted 
out.write(StringEscapeUtils.unescapeHtml3(testString) + "\n");

你就錯了。 Java的取消轉義這種形式的字符串常量在編譯的時候，當他們建立在類文件：

"\u2018Libations\u2019"

沒有HTML 3逃離這個代碼。你選擇的方法被設計爲形式爲‘的unescape轉義序列。

您可能需要unescapeJava方法。

來源

2013-05-15 10:33:14 McDowell

你完全正確。太感謝了。 –

這對我有用。謝謝。你節省了我的時間 – Shailesh

您的字符串正在使用您的平臺默認編碼讀取和寫入。你要明確指定字符集爲 'UTF-8' 的使用方法：

輸入流：

BufferedReader b = new BufferedReader(new InputStreamReader(
     new FileInputStream("Home Timeline.txt"), 
     Charset.forName("UTF-8")));

輸出流：

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
     new FileOutputStream("out.txt"), 
     Charset.forName("UTF-8")));

來源

2013-05-14 19:29:08 Perception

不幸的是，這並沒有奏效。 –

StringEscapeUtils.unescapeHtml對從文件中讀取的字符串不起作用

回答

相關問題