得到一個字符的Unicode值

50

你可以在這裏使用一個襯墊做任何的Java字符：

System.out.println("\\u" + Integer.toHexString('÷' | 0x10000).substring(1));

但它一定會爲Unicode字符工作到Unicode的3.0，這是爲什麼我精簡你可以做任何Java字符。

由於Java是在Unicode 3.1出現之前設計的，因此Java的char原語不足以表示Unicode 3.1和以上版本：沒有「一個Java字符到一個Java字符」映射了（而不是一個怪異的黑客使用）。

所以你真的需要在這裏檢查你的需求：你需要支持Java char還是任何可能的Unicode字符？

來源

2010-02-08 09:07:44 SyntaxT3rr0r

+0

謝謝。我用這種方法檢查過所有的人物，現在看起來很好。 – Saurabh 2010-02-08 09:27:02

+4

「怪異的黑客」是UTF-16，它被廣泛使用。這可能並不理想，但它比僅支持UCS-2更好理解並且更好。 – 2010-02-08 09:47:28

+1

@Joachim：然而，現在'String.charAt'返回「半個字符」和'String.length'返回的東西可以不同於字符數是醜陋的，不是嗎？（這裏的字符表示Unicode代碼點，而不是Java字符）String類應該是獨立於編碼問題（並且在Unicode 3.1之前）。 – Thilo 2010-02-08 09:57:40

0

我在網上發現了這個不錯的代碼。

import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStreamReader; 

public class Unicode { 

public static void main(String[] args) { 
System.out.println("Use CTRL+C to quite to program."); 

// Create the reader for reading in the text typed in the console. 
InputStreamReader inputStreamReader = new InputStreamReader(System.in); 
BufferedReader bufferedReader = new BufferedReader(inputStreamReader); 

try { 
    String line = null; 
    while ((line = bufferedReader.readLine()).length() > 0) { 
    for (int index = 0; index < line.length(); index++) { 

     // Convert the integer to a hexadecimal code. 
     String hexCode = Integer.toHexString(line.codePointAt(index)).toUpperCase(); 


     // but the it must be a four number value. 
     String hexCodeWithAllLeadingZeros = "0000" + hexCode; 
     String hexCodeWithLeadingZeros = hexCodeWithAllLeadingZeros.substring(hexCodeWithAllLeadingZeros.length()-4); 

     System.out.println("\\u" + hexCodeWithLeadingZeros); 
    } 

    } 
} catch (IOException ioException) { 
     ioException.printStackTrace(); 
    } 
} 
}

Original Article

來源

2010-02-08 08:45:11

+2

谷歌爲贏得 – 2010-02-08 08:56:38

+0

謝謝。你給我所問的。但是，當我嘗試一些俄語字符時，它會返回相同的Unicode值。我認爲Unicode值對於不同的字符應該是不同的。我試過以下字符 - л，и，ц，т，яretuns \ u003F。 – Saurabh 2010-02-08 09:05:20

+1

我很確定這段代碼對於0xFFFF以上的代碼點是不正確的。 – SyntaxT3rr0r 2010-02-08 09:12:01

31

如果你有Java 5中，使用char c = ...; String s = String.format ("\\u%04x", (int)c);

如果您的源不是一個Unicode字符（char），但一個字符串，則必須使用charAt(index)在index的位置來得到Unicode字符。

不要使用codePointAt(index)，因爲這將返回24位值（完整的Unicode），它不能用4個十六進制數字表示（它需要6個）。見the docs for an explanation。

要解釋清楚：此答案不使用Unicode，而是使用Java表示Unicode字符（即代理對）的方法，因爲char是16位，Unicode是24位。問題應該是：「我怎樣才能將char轉換爲4位十六進制數」，因爲它不是（真的）關於Unicode。

來源

2010-02-08 09:13:09

+0

將char轉換爲int首先 – Bozho 2010-02-08 09:22:45

+2

@Aaron Digulla：認爲charAt（...）返回一個Unicode字符是一個常見的錯誤。它沒有。如果您的字符串由Unicode 3.0/BMP字符組成，charAt（...）僅返回一個Unicode字符。我不同意他不應該使用codePointAt。他應該使用codePointAt和一種能夠在BMP之外編碼字符的方法。 – SyntaxT3rr0r 2010-02-08 09:24:24

+0

codePointAt會更好，但假設你真的需要它，找出索引的正確值會變得棘手。 – Thilo 2010-02-08 09:33:48

9

private static String toUnicode(char ch) { 
    return String.format("\\u%04x", (int) ch); 
}

來源

2013-08-07 08:20:01

+5

複製3年前的現有答案。 – 2015-05-29 18:01:59

4

char c = 'a'; 
String a = Integer.toHexString(c); // gives you---> a = "61"

來源

2014-06-11 14:29:37

0

你挑剔與使用Unicode，因爲它與Java的更簡單，如果你寫的程序中使用「DEC」值或（HTML代碼）char和INT之間，那麼你可以簡單地轉換其數據類型

char a = 98; 
char b = 'b'; 
char c = (char) (b+0002); 

System.out.println(a); 
System.out.println((int)b); 
System.out.println((int)c); 
System.out.println(c);

給出了這樣的輸出

來源

2015-02-26 03:33:59

0

首先，我得到了焦炭的高側。之後，得到低端。在HexString中轉換所有的東西並放置前綴。

int hs = (int) c >> 8; 
int ls = hs & 0x000F; 

String highSide = Integer.toHexString(hs); 
String lowSide = Integer.toHexString(ls); 
lowSide = Integer.toHexString(hs & 0x00F0); 
String hexa = Integer.toHexString((int) c); 

System.out.println(c+" = "+"\\u"+highSide+lowSide+hexa);

來源

2015-04-12 21:47:51

得到一個字符的Unicode值

回答

相關問題