變換的Unicode的純文本的子串

我得到了一個Unicode字符串，這樣從外部服務器：變換的Unicode的純文本的子串

005400610020007400650020007400ED0020007400FA0020003F0020003A0029

，我必須用java進行解碼。我知道'\ u'前綴創造了魔術（即'\ u0054' - >'T'），但我不知道如何將它轉換爲常用字符串。

在此先感謝。

編輯：謝謝大家。所有的答案工作，但我不得不選擇只有一個:(

再次感謝。

來源

2010-07-27 Aito

它看起來像一個UTF-16編碼？這裏是把它的方法：

public static String decode(String hexCodes, String encoding) throws UnsupportedEncodingException { 
    if (hexCodes.length() % 2 != 0) 
     throw new IllegalArgumentException("Illegal input length"); 
    byte[] bytes = new byte[hexCodes.length()/2]; 
    for (int i = 0; i < bytes.length; i++) 
     bytes[i] = (byte) Integer.parseInt(hexCodes.substring(2 * i, 2 * i + 2), 16); 
    return new String(bytes, encoding); 
} 

public static void main(String[] args) throws UnsupportedEncodingException { 
    String hexCodes = "005400610020007400650020007400ED0020007400FA0020003F0020003A0029"; 
    System.out.println(decode(hexCodes, "UTF-16")); 
}

}

你的示例返回「的Ta TE TITú:)？」

來源

2010-07-27 14:46:43

你可以簡單的字符串分割長度爲4的字符串，然後使用Integer.parseInt(s, 16)得到的數值。Cast的到。一個char，並建立一個字符串出它對於上面的例子，你將獲得：

鉭TE TITú:)

來源

2010-07-27 14:40:25 Moritz

這隻適用於UTF-16。編碼爲UTF-8的文本將返回垃圾。 – 2010-07-27 14:53:58

這個問題表明，源碼肯定是UTF-16編碼的。 – Philipp 2010-07-27 14:55:45

@菲利普：你是對的，當他說「你做了這個魔術」的時候，它是隱含地提到的。但我相信標題值得更通用的回答:) – 2010-07-27 15:04:46

它可以解釋爲UTF-16或作爲UCS2（序列代碼點以2字節編碼，十六進制表示），只要我們不在BMP之外，它就是等效的。另一種解析方法：

public static String mydecode(String hexCode) { 
    StringBuilder sb = new StringBuilder(); 
    for(int i=0;i<hexCode.length();i+=4) 
     sb.append((char)Integer.parseInt(hexCode.substring(i,i+4),16)); 
    return sb.toString(); 
} 

public static void main(String[] args) { 
    String hexCodes = "005400610020007400650020007400ED0020007400FA0020003F0020003A0029"; 
    System.out.println(mydecode(hexCodes)); 
}

來源

2010-07-27 17:09:45 leonbloy

變換的Unicode的純文本的子串

回答

相關問題