採取它在原來的問題:我如何檢查的字節數組是否包含在Java中的Unicode字符串?;我發現術語Java Unicode實質上是指Utf16代碼單元。我自己解決了這個問題,並創建了一些代碼,可以幫助任何有此類問題的人在他們的腦海中找到一些答案。
我已經建立2種主要方法,一種將顯示UTF-8代碼的單位和其他將創建UTF-16代碼單元。 UTF-16代碼單元是你將與Java和JavaScript遇到什麼...常見的形式爲「\ ud83d」
對於代碼單元和轉換嘗試的網站更多的幫助;
https://r12a.github.io/apps/conversion/
這裏是代碼...
byte[] array_bytes = text.toString().getBytes();
char[] array_chars = text.toString().toCharArray();
System.out.println();
byteArrayToUtf8CodeUnits(array_bytes);
System.out.println();
charArrayToUtf16CodeUnits(array_chars);
public static void byteArrayToUtf8CodeUnits(byte[] byte_array)
{
/*for (int k = 0; k < array.length; k++)
{
System.out.println(name + "[" + k + "] = " + "0x" + byteToHex(array[k]));
}*/
System.out.println("array.length: = " + byte_array.length);
//------------------------------------------------------------------------------------------
for (int k = 0; k < byte_array.length; k++)
{
System.out.println("array byte: " + "[" + k + "]" + " converted to hex" + " = " + byteToHex(byte_array[k]));
}
//------------------------------------------------------------------------------------------
}
public static void charArrayToUtf16CodeUnits(char[] char_array)
{
/*Utf16 code units are also known as Java Unicode*/
System.out.println("array.length: = " + char_array.length);
//------------------------------------------------------------------------------------------
for (int i = 0; i < char_array.length; i++)
{
System.out.println("array char: " + "[" + i + "]" + " converted to hex" + " = " + charToHex(char_array[i]));
}
//------------------------------------------------------------------------------------------
}
static public String byteToHex(byte b)
{
//Returns hex String representation of byte b
char hexDigit[] =
{
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
};
char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
return new String(array);
}
static public String charToHex(char c)
{
//Returns hex String representation of char c
byte hi = (byte) (c >>> 8);
byte lo = (byte) (c & 0xff);
return byteToHex(hi) + byteToHex(lo);
}
類似的問題已經從愛德華·王爾德一些有用的鏈接 - http://stackoverflow.com/questions/377294/howto-identify-utf- 8編碼字符串 – JonoW 2009-07-28 10:23:35