java編碼misunderstoud

我無法理解關於編碼的一個棘手問題。java編碼misunderstoud

爲什麼當你在2（兩次）增加字符串時，它的長度增加到1.5。

代碼：

public class Appl { 
    public static void main(String[] args) throws Exception { 

     System.out.println("A".getBytes("UTF-16").length); 
     System.out.println("AA".getBytes("UTF-16").length); 
    } 
}

輸出將是：

這可能看起來有點傻，但我想不出爲什麼這是否發生？

有什麼建議嗎？

來源

2013-11-23 nazar_art

UTF-16編碼使用可選的byte-order mark來標識字節順序。請參閱Charset類以獲取不同的字符集信息。

如果您使用，例如，UTF-16BE - （大尾端）來代替，你會得到預期的結果：

System.out.println("A".getBytes("UTF-16BE").length); // 2 (2 + 2 with UTF-16) 
System.out.println("AA".getBytes("UTF-16BE").length); // 4 (2 + 4 with UTF-16) 
System.out.println("AAA".getBytes("UTF-16BE").length); // 6 (2 + 6 with UTF-16)

來源

2013-11-23 11:51:07

前兩個字節是字節順序標記，請參見http://en.wikipedia.org/wiki/Byte_Order_Mark。之後，每個額外的Java字符佔用兩個字節（Java在內部使用UTF-16，但是存在編碼爲兩個Java字符的unicode代碼點）。

要詳細瞭解發生了什麼，只需使用Arrays.toString（...）打印字節數組。 'A'的unicode代碼點爲65.

來源

2013-11-23 11:50:44

java編碼misunderstoud

回答

相關問題