2010-04-20 98 views
1

我使用java io從可能輸出字符(如é)的服務器檢索文本。 然後使用System.err輸出它們,它們變成'?'。我正在使用UTF8編碼。怎麼了? int len = 0;閱讀unicode

char[] buffer = new char[1024]; 
OutputStream os = sock.getOutputStream(); 
InputStream is = sock.getInputStream(); 
os.write(query.getBytes("UTF8"));//iso8859_1")); 

Reader reader = new InputStreamReader(is, Charset.forName("UTF-8")); 
do { 
    len = reader.read(buffer); 
    if (len > 0) { 
     if (outstring == null) { 
      outstring = new StringBuffer(); 
     } 
     outstring.append(buffer, 0, len); 
    } 
} while (len > 0); 
System.err.println(outstring); 

編輯:只是嘗試下面的代碼:

StringBuffer b = new StringBuffer(); 
for (char c = 'a'; c < 'd'; c++) { 
    b.append(c); 
} 
b.append('\u00a5'); // Japanese Yen symbol 
b.append('\u01FC'); // Roman AE with acute accent 
b.append('\u0391'); // GREEK Capital Alpha 
b.append('\u03A9'); // GREEK Capital Omega 

for (int i = 0; i < b.length(); i++) { 
    System.out.println("Character #" + i + " is " + b.charAt(i)); 
} 
System.out.println("Accumulated characters are " + b); 

出來是垃圾,以及:

 
Character #0 is a 
Character #1 is b 
Character #2 is c 
Character #3 is ¥ 
Character #4 is ? 
Character #5 is ? 
Character #6 is ? 
Accumulated characters are abc¥??? 
+0

重新格式化的代碼;如果不正確請回復。 – trashgod 2010-04-20 04:57:19

+0

儘管與這個問題無關,在這種用法中首選'StringBuilder'。 – trashgod 2010-04-20 05:10:47

回答

0

寫這一個文件,並檢查它是如何到來。如果它正確地在文件中出現,那麼它與你的錯誤流(編碼不是UTF-8)有關。如果還有它作爲垃圾字符在你的服務器編碼可能不是UTF-8。

+0

文件出來了,但其他參考程序讀取並顯示unicode字符就好(我沒有該程序的源代碼) – user121196 2010-04-20 04:40:34

+0

我在eclipse中將編碼更改爲UTF-8並運行新添加的代碼正在到來......請以這種方式檢查。 – sreejith 2010-04-20 05:04:07

2

首先,驗證系統屬性(file.encoding),實際上是UTF8。如果是,那麼你的問題不是你正在運行的代碼,而是你的終端程序(或其他輸出顯示)無法正確呈現輸出。

0

你的第二個例子爲我生成以下輸出。

Character #0 is a 
Character #1 is b 
Character #2 is c 
Character #3 is ¥ 
Character #4 is Ǽ 
Character #5 is Α 
Character #6 is Ω 
Accumulated characters are abc¥ǼΑΩ 

此代碼會生成具有相同內容的正確編碼的UTF-8文件。

StringBuilder b = new StringBuilder(); 
for (char c = 'a'; c < 'd'; c++) { 
    b.append(c); 
} 
b.append('\u00a5'); // Japanese Yen symbol 
b.append('\u01FC'); // Roman AE with acute accent 
b.append('\u0391'); // GREEK Capital Alpha 
b.append('\u03A9'); // GREEK Capital Omega 

PrintStream out = new PrintStream("temp.txt", "UTF-8"); 
for (int i = 0; i < b.length(); i++) { 
    out.println("Character #" + i + " is " + b.charAt(i)); 
} 
out.println("Accumulated characters are " + b); 

參見:The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)