一個奇怪的字符

-3

 String str = "ิ"; 
     System.out.println(str.length()); 
     byte[] b = str.getBytes(); 
     System.out.println(b[0]); 
     System.out.println(b[1]); 
     System.out.println(b[2]);

以上是我的code.A spez char在str。它的長度是1，但是byte是3。如何使它成爲一個？如何打印此char使用java代碼？而在我的android手機中，這個字符不能刪除。一個奇怪的字符

來源

2015-05-04 CoolEgos

你能更詳細點嗎？ – Blip

閱讀Unicode以及如何在編程中處理它。 – Julian

其因字符串「編碼」成字節，根據文檔

將此String解碼使用平臺的默認字符集，存儲結果到一個新的字節數組的字節序列。未指定此字符串無法在默認字符集中編碼時的此方法的行爲。當需要對編碼過程進行更多控制時，應使用CharsetEncoder類。

來源

2015-05-04 14:02:28

看起來你的特殊字符是用UTF-8編碼的。 UTF-8字符具有不同的字節大小，具體取決於它們在範圍內的位置。

您可以在維基百科頁面here中找到算法，並查看尺寸是如何確定的。

從Java字符串length()documentation：

的長度等於串中Unicode代碼單元的數量。

由於字符是使用3個字節進行編碼的（而Unicode是一個字節），所以您得到的長度爲3，而不是像您期望的那樣長度爲1。

來源

2015-05-04 14:04:01

Lenght是不是字節

你只有1個卡拉科特，但這卡拉科特是3個字節。一個字符串由多個字符組成，但這並不意味着1個字符串將是1個字節。

有關卡拉科特「ิ。

Java是通過使用UNICODE默認（編碼」。ิ實際上是0E34，該值beeing泰語字符SARA）

關於您的編碼問題

您需要更改應用程序執行字符集編碼的方式，並改爲使用utf-8編碼。

來源

2015-05-04 14:12:52 Mekap

除了所有其他意見。這裏有一個小片段來展示它。

String str = "ิ"; // \u0E34 
System.out.println("character length: " + str.length()); 

System.out.print("bytes: "); 
for (byte b : str.getBytes("UTF-8")) { 
    System.out.append(Integer.toHexString(b & 0xFF).toUpperCase() + " "); 
} 
System.out.println(""); 

int codePoint = Character.codePointAt(str, 0); 
System.out.println("unicode name of the codepoint: " + Character.getName(codePoint));

輸出

character length: 1 
bytes: E0 B8 B4 
unicode name of the codepoint: THAI CHARACTER SARA I

來源

2015-05-04 14:33:03 SubOptimal

一個奇怪的字符

回答

相關問題