用Python保存UTF-8 CSV

我一直在爲此苦苦掙扎，並且閱讀了很多線程，但我似乎無法得到這個工作。我需要保存一個UTF-8 CSV文件。用Python保存UTF-8 CSV

首先，這裏是我的超級簡單的方法：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 
import codecs 

f = codecs.open("output.csv", "w", "utf-8-sig") 
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL) 
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
writer.writerow(cells)

導致一個錯誤：

Traceback (most recent call last): 
    File "./makesimplecsv.py", line 10, in <module> 
    cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)

我已經使用在Python文檔中列出的UnicodeWriter類也試過（https://docs.python.org/2/library/csv.html#examples ）：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 
import codecs 
import cStringIO 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([s.encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

f = codecs.open("output.csv", "w", "utf-8-sig") 
writer = UnicodeWriter(f) 
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
writer.writerow(cells)

導致相同的錯誤：

Traceback (most recent call last): 
    File "./makesimplecsvwithunicodewriter.sh", line 40, in <module> 
    cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)

我想我會通過對事物的清單，我在其他類似的問題找到了：

我的文件都有一個編碼聲明。
我正在打開用UTF-8編寫的文件。
我在將它們傳遞給CSV編寫器之前，先編碼UTF-8中的單個字符串。
我試過了，沒有添加UTF-8 BOM，但這看起來沒有什麼區別，或者說確實很重要，從我讀過的東西。

關於我在做什麼的錯誤？

來源

2014-07-25 antun

您正在爲您的CSV文件寫入編碼的字節字符串。當你期待Unicode對象時，這樣做沒什麼意義。

不編碼，解碼：

cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")]

或使用u'...' Unicode字符串：

cells = [u"hello", u"nǐ hǎo", u"你好"]

您不能使用codecs.open()文件對象使用Python 2 csv模塊。可以使用UnicodeWriter方法（使用常規文件對象）並傳入Unicode對象，或者將您的單元格編碼爲字節字符串，並直接使用csv.writer()對象（再次使用常規文件對象），因爲這就是UnicodeWriter所做的;將編碼的字節字符串傳遞給csv.writer()對象。

來源

2014-07-25 15:28:40

謝謝！根據您的反饋，我能夠做到這一點。我使用UnicodeWriter方法，並將encode（）調用切換到decode（），並使用標準open（）函數獲取要寫入的文件對象。我將使用適用於未來參考的解決方案更新問題。 – antun

@antun：如果您覺得有必要，可以添加自己的解決方案作爲新的答案;這個問題應該只是一個問題。 –

好吧，我會將其添加爲答案。 – antun

更新 - 解

由於接受的答案，我能夠得到這個工作。以下是供將來參考的完整示例：

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 
import codecs 
import cStringIO 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([s.encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

f = open("output.csv", "w") 

writer = UnicodeWriter(f) 
cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")] 
writer.writerow(cells)

來源

2014-07-25 16:28:04 antun

用Python保存UTF-8 CSV

回答

相關問題