2014-07-25 65 views
0

我一直在爲此苦苦掙扎,並且閱讀了很多線程,但我似乎無法得到這個工作。我需要保存一個UTF-8 CSV文件。用Python保存UTF-8 CSV

首先,這裏是我的超級簡單的方法:

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 
import codecs 

f = codecs.open("output.csv", "w", "utf-8-sig") 
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL) 
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
writer.writerow(cells) 

導致一個錯誤:

Traceback (most recent call last): 
    File "./makesimplecsv.py", line 10, in <module> 
    cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128) 

我已經使用在Python文檔中列出的UnicodeWriter類也試過(https://docs.python.org/2/library/csv.html#examples ):

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 
import codecs 
import cStringIO 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([s.encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

f = codecs.open("output.csv", "w", "utf-8-sig") 
writer = UnicodeWriter(f) 
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
writer.writerow(cells) 

導致相同的錯誤:

Traceback (most recent call last): 
    File "./makesimplecsvwithunicodewriter.sh", line 40, in <module> 
    cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")] 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128) 

我想我會通過對事物的清單,我在其他類似的問題找到了:

  • 我的文件都有一個編碼聲明。
  • 我正在打開用UTF-8編寫的文件。
  • 我在將它們傳遞給CSV編寫器之前,先編碼UTF-8中的單個字符串。
  • 我試過了,沒有添加UTF-8 BOM,但這看起來沒有什麼區別,或者說確實很重要,從我讀過的東西。

關於我在做什麼的錯誤?

回答

3

您正在爲您的CSV文件寫入編碼的字節字符串。當你期待Unicode對象時,這樣做沒什麼意義。

不編碼,解碼

cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")] 

或使用u'...' Unicode字符串:

cells = [u"hello", u"nǐ hǎo", u"你好"] 

您不能使用codecs.open()文件對象使用Python 2 csv模塊。可以使用UnicodeWriter方法(使用常規文件對象)並傳入Unicode對象,或者將您的單元格編碼爲字節字符串,並直接使用csv.writer()對象(再次使用常規文件對象),因爲這就是UnicodeWriter所做的;將編碼的字節字符串傳遞給csv.writer()對象。

+0

謝謝!根據您的反饋,我能夠做到這一點。我使用UnicodeWriter方法,並將encode()調用切換到decode(),並使用標準open()函數獲取要寫入的文件對象。我將使用適用於未來參考的解決方案更新問題。 – antun

+1

@antun:如果您覺得有必要,可以添加自己的解決方案作爲新的答案;這個問題應該只是一個問題。 –

+0

好吧,我會將其添加爲答案。 – antun

1

更新 - 解

由於接受的答案,我能夠得到這個工作。以下是供將來參考的完整示例:

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import csv 
import sys 
import codecs 
import cStringIO 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([s.encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

f = open("output.csv", "w") 

writer = UnicodeWriter(f) 
cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")] 
writer.writerow(cells)