編碼 - codePointCount和長度之間的不同結果

我發現了一個棘手的地方，找不到任何答案，爲什麼發生這種情況。編碼 - codePointCount和長度之間的不同結果

主要問題是字符串有多長。

是否包含一個或兩個字符。

代碼：

public class App { 
    public static void main(String[] args) throws Exception { 
     char ch0 = 55378; 
     char ch1 = 56816; 
     String str = new String(new char[]{ch0, ch1}); 
     System.out.println(str); 
     System.out.println(str.length()); 
     System.out.println(str.codePointCount(0, 2)); 
     System.out.println(str.charAt(0)); 
     System.out.println(str.charAt(1)); 
    } 
}

輸出：

? 
2 
1 
? 
?

有什麼建議？

來源

2013-11-23 nazar_art

我建議你花些時間閱讀[本文]（http://kunststube.net/encoding/） –

你期望輸出什麼？ –

是否包含一個或兩個字符。

它包含一個的Unicode字符，其由2 UTF-16 代碼單元。 Java中的每個char都是一個UTF-16代碼單元......它可能不是一個完整的字符。每個字符都有一個單一的代碼點--Unicode提供了一個編碼字符集將每個字符映射到代表該字符（代碼點）的整數。

length()返回代碼單元的數量，而codePointCount返回代碼點的數量。

您可能想看看我的文章encodings in .NET - 術語都翻譯得很好（因爲它是標準術語），所以只需忽略特定於.NET的部分即可。

來源

2013-11-23 12:33:18

這正是我正在尋找的 –

編碼 - codePointCount和長度之間的不同結果

回答

相關問題