utf-8轉換爲utf-16

我想將中文字符轉換爲unicode格式，如'\ uXXXX' ，但是當我使用str.encode（'utf-16be'）時，它會顯示：utf-8轉換爲utf-16

b'\xOO\xOO'

所以，我寫一些代碼來執行我的要求如下：

data="index=索引?" 
print(data.encode('UTF-16LE')) 

def convert(s): 
    returnCode=[] 
    temp='' 
    for n in s.encode('utf-16be'): 
     if temp=='': 
      if str.replace(hex(n),'0x','')=='0': 
       temp='00' 
       continue 
      temp+=str.replace(hex(n),'0x','') 
     else: 
      returnCode.append(temp+str.replace(hex(n),'0x','')) 
      temp='' 

    return returnCode 

print(convert(data))

能有人給我建議做在python 3.x的這種轉換？

來源

2013-11-26 alvinshih

什麼是你定義的字符串的文件的編碼？ – Kimvais

不確定是什麼問題。 UTF-16LE不是Unicode，但它是微軟稱之爲「Unicode」的東西。描述你的目標，而不是你的過程。 –

'「index =索引？」。encode（'utf-16be'）'給出'b'\ x00i \ x00n \ x00n \ x00d \ x00e \ x00x \ x00 =}「_ \ x15 \ x00？ – lvc

我不確定我是否理解你。

Unicode就像一個類型。在python 3中，所有字符串都是unicode，所以當你編寫data = "index=索引?"時，數據已經是unicode。如果你想要得到的只是用於顯示的另一種表示，你可以使用：

def display_unicode(data): 
    return "".join(["\\u%s" % hex(ord(l))[2:].zfill(4) for l in data]) 

>>> data = "index=索引?" 
>>> print(display_unicode(data)) 
\u0069\u006e\u0064\u0065\u0078\u003d\u7d22\u5f15\u003f

注意，字符串有現在真正的反斜線和數字表示，沒有Unicode字符。

但可能還有其他的替代品

>>> data.encode('ascii', 'backslashreplace') 
b'index=\\u7d22\\u5f15?' 
>>> data.encode('unicode_escape') 
b'index=\\u7d22\\u5f15?'

來源

2013-11-26 09:19:03 erny

OP幾乎可以肯定地使用Python 3--參見print被用作函數和'b''文字。另外，文本文件的編碼不一定跟在'$ LANG'後面 - IDEs文本編輯器經常讓你將它設置爲它們的配置，並且有它們自己的默認值。 – lvc

我使用的是python3.3，默認編碼是UTF-8 – alvinshih

對不起，我沒有正確的讀過這個問題。編碼（'ascii'，'backslashreplace'）'訣竅？ – erny

-1

嘗試先解碼，如：s.decode('utf-8').encode('utf-16be')？

來源

2013-11-26 09:07:57 greg

'parens on'print '暗示Python 3.x. –

utf-8轉換爲utf-16

回答

相關問題