在Python 3中從utf-16轉換爲utf-8

我在Python 3中編程，我遇到了一個小問題，我在網上找不到任何對它的引用。在Python 3中從utf-16轉換爲utf-8

據我所知，默認字符串是utf-16，但我必須使用utf-8，我找不到將從默認字符串轉換爲utf-8的命令。非常感謝您的幫助。

2010-06-29 idan

在Python 3中，有兩種不同的數據類型在使用字符串操作時很重要。首先是字符串類，它是一個表示unicode代碼點的對象。重要的是這個字符串不是一些字節，而是一串字符。其次，有字節類，它只是一個字節序列，通常表示存儲在編碼中的字符串（如utf-8或iso-8859-15）。

這對你意味着什麼？據我所知，你想讀寫utf-8文件。讓我們製作一個用'ç'字符代替所有'ć'的程序

def main(): 
    # Let's first open an output file. See how we give an encoding to let python know, that when we print something to the file, it should be encoded as utf-8 
    with open('output_file', 'w', encoding='utf-8') as out_file: 
     # read every line. We give open() the encoding so it will return a Unicode string. 
     for line in open('input_file', encoding='utf-8'): 
      #Replace the characters we want. When you define a string in python it also is automatically a unicode string. No worries about encoding there. Because we opened the file with the utf-8 encoding, the print statement will encode the whole string to utf-8. 
      print(line.replace('ć', 'ç'), out_file)

那麼你什麼時候應該使用字節呢？不經常。我能想到的一個例子是當你從套接字中讀取某些東西時。如果你在一個字節對象中有這個，你可以通過使用str.encode（'encoding'）來執行bytes.decode（'encoding'），反之亦然。但正如所說，可能你不會需要它。

不過，因爲它有趣的是，這裏的艱辛的道路，在那裏你自己編碼的一切：

def main(): 
    # Open the file in binary mode. So we are going to write bytes to it instead of strings 
    with open('output_file', 'wb') as out_file: 
     # read every line. Again, we open it binary, so we get bytes 
     for line_bytes in open('input_file', 'rb'): 
      #Convert the bytes to a string 
      line_string = bytes.decode('utf-8') 
      #Replace the characters we want. 
      line_string = line_string.replace('ć', 'ç') 
      #Make a bytes to print 
      out_bytes = line_string.encode('utf-8') 
      #Print the bytes 
      print(out_bytes, out_file)

這個話題好讀書（字符串編碼）是http://www.joelonsoftware.com/articles/Unicode.html。真的推薦閱讀！

來源：http://docs.python.org/release/3.0.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

（PS正如你看到的，我並沒有提及在這個崗位UTF-16其實我不知道蟒蛇是否使用這個內部解碼或不是，但它是完全不相干的。。目前你正在使用一個字符串，你使用字符（代碼點），而不是字節。

來源

2010-06-29 11:40:02

Python確實使用UTF-16作爲Windows的內部編碼在Linux上，它使用UTF-32 – dan04 2010-06-30 01:33:32

hi ，感謝您的回答。 Dan04你知道怎麼才能告訴它只使用utf-8？ – idan 2010-07-16 09:07:48

@idan你爲什麼要這麼做？無論如何，它是不可能，除非你自己修改和重新編譯Python ... – 2010-07-16 09:23:34

在Python 3中從utf-16轉換爲utf-8

回答

相關問題