我一直在爲此苦苦掙扎,並且閱讀了很多線程,但我似乎無法得到這個工作。我需要保存一個UTF-8 CSV文件。用Python保存UTF-8 CSV
首先,這裏是我的超級簡單的方法:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
import codecs
f = codecs.open("output.csv", "w", "utf-8-sig")
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
writer.writerow(cells)
導致一個錯誤:
Traceback (most recent call last):
File "./makesimplecsv.py", line 10, in <module>
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)
我已經使用在Python文檔中列出的UnicodeWriter類也試過(https://docs.python.org/2/library/csv.html#examples ):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
import codecs
import cStringIO
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
f = codecs.open("output.csv", "w", "utf-8-sig")
writer = UnicodeWriter(f)
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
writer.writerow(cells)
導致相同的錯誤:
Traceback (most recent call last):
File "./makesimplecsvwithunicodewriter.sh", line 40, in <module>
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)
我想我會通過對事物的清單,我在其他類似的問題找到了:
- 我的文件都有一個編碼聲明。
- 我正在打開用UTF-8編寫的文件。
- 我在將它們傳遞給CSV編寫器之前,先編碼UTF-8中的單個字符串。
- 我試過了,沒有添加UTF-8 BOM,但這看起來沒有什麼區別,或者說確實很重要,從我讀過的東西。
關於我在做什麼的錯誤?
謝謝!根據您的反饋,我能夠做到這一點。我使用UnicodeWriter方法,並將encode()調用切換到decode(),並使用標準open()函數獲取要寫入的文件對象。我將使用適用於未來參考的解決方案更新問題。 – antun
@antun:如果您覺得有必要,可以添加自己的解決方案作爲新的答案;這個問題應該只是一個問題。 –
好吧,我會將其添加爲答案。 – antun