Python UTF-16輸出和Windows行結尾的錯誤？

test.py

import sys 
import codecs 

sys.stdout = codecs.getwriter('utf-16')(sys.stdout) 

print "test1" 
print "test2"

然後我運行它：

test.py > test.txt

在Python 2.6在Windows 2000上，我發現了換行符正在輸出爲字節序列\x0D\x0A\x00這對UTF-16當然是錯誤的。

我錯過了什麼，或者這是一個錯誤？

來源

2009-07-23 Craig McQueen

在Mac OS X中，它工作正常：「fe ff 00」是前三個字節。 – 2009-07-23 05:57:51

有趣的信息，但我不明白它是如何與問題相關的。我想這個問題只對具有Windows風格（CR-LF）行尾的平臺很重要。 – 2009-07-23 06:12:17

試試這個：

import sys 
import codecs 

if sys.platform == "win32": 
    import os, msvcrt 
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) 

class CRLFWrapper(object): 
    def __init__(self, output): 
     self.output = output 

    def write(self, s): 
     self.output.write(s.replace("\n", "\r\n")) 

    def __getattr__(self, key): 
     return getattr(self.output, key) 

sys.stdout = CRLFWrapper(codecs.getwriter('utf-16')(sys.stdout)) 
print "test1" 
print "test2"

來源

2009-07-23 08:44:36

換行符轉換髮生在stdout文件中。你正在寫「test1 \ n」給sys.stdout（一個StreamWriter）。 StreamWriter將其轉換爲「t \ x00e \ x00s \ x00t \ x001 \ x00 \ n \ x00」，並將其發送到實際文件，即原始sys.stderr。

該文件不知道您已將數據轉換爲UTF-16;所有它知道的是，輸出流中的任何\ n值都需要轉換爲\ x0D \ x0A，這會導致您看到的輸出。

來源

2009-07-23 06:15:35

謝謝，這是有見地的，並指出我在正確的方向。 – 2009-07-23 07:10:38

到目前爲止，我發現了兩種解決方案，但沒有給出UTF-16 與 Windows風格行結束的輸出。

首先，到Python重定向print語句來使用UTF-16編碼一個文件（輸出Unix樣式行結束）：

import sys 
import codecs 

sys.stdout = codecs.open("outputfile.txt", "w", encoding="utf16") 

print "test1" 
print "test2"

其次，重定向到stdout使用UTF-16編碼，沒有行結束翻譯腐敗（輸出的Unix風格的行結束符）（感謝this ActiveState recipe）：

import sys 
import codecs 

sys.stdout = codecs.getwriter('utf-16')(sys.stdout) 

if sys.platform == "win32": 
    import os, msvcrt 
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) 

print "test1" 
print "test2"

來源

2009-07-23 08:07:41

Python UTF-16輸出和Windows行結尾的錯誤？

回答

相關問題