Python DictWriter編寫UTF-8編碼的CSV文件

我有一個包含unicode字符串的字典列表。
csv.DictWriter可以將字典列表寫入CSV文件。
我想CSV文件以UTF8編碼。
csv模塊無法處理將unicode字符串轉換爲UTF8。
的csv模塊文檔具有用於一切轉換爲UTF8一個例子：Python DictWriter編寫UTF-8編碼的CSV文件
```
def utf_8_encoder(unicode_csv_data): 
    for line in unicode_csv_data: 
     yield line.encode('utf-8') 
```
它也有一個UnicodeWriter類。

但是...如何使DictWriter與這些工作？難道他們不得不在自己的中間注入自己，在將它們寫入文件之前趕上反彙編的字典並對它們進行編碼？我不明白。

2011-04-30 endolith

如果使用Python 2.7或更高版本，使用一個字典理解重新映射字典爲UTF-8傳遞到DictWriter之前：

# coding: utf-8 
import csv 
D = {'name':u'馬克','pinyin':u'mǎkè'} 
f = open('out.csv','wb') 
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) 
w = csv.DictWriter(f,sorted(D.keys())) 
w.writeheader() 
w.writerow({k:v.encode('utf8') for k,v in D.items()}) 
f.close()

你可以使用這個想法來更新UnicodeWriter到DictUnicodeWriter：

# coding: utf-8 
import csv 
import cStringIO 
import codecs 

class DictUnicodeWriter(object): 

    def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, D): 
     self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()}) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for D in rows: 
      self.writerow(D) 

    def writeheader(self): 
     self.writer.writeheader() 

D1 = {'name':u'馬克','pinyin':u'Mǎkè'} 
D2 = {'name':u'美國','pinyin':u'Měiguó'} 
f = open('out.csv','wb') 
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) 
w = DictUnicodeWriter(f,sorted(D.keys())) 
w.writeheader() 
w.writerows([D1,D2]) 
f.close()

來源

2011-04-30 01:19:59

我認爲降級到Python（x，y）2.6.6.0會讓事情變得更簡單。 :) – endolith 2011-04-30 01:50:41

@endolith：你可以使用'dict（（k，v.encode（'utf-8'）if isinstance（v，unicode）else v）for k，v in D.iteritems（））''而不是dict理解Python 2.6。 – jfs 2011-04-30 05:37:38

'if isinstance（v，unicode）'部分是必不可少的！ – reubano 2014-03-06 07:42:43

當您將csv.writer與您的內容聯繫起來時，其想法是通過utf_8_encoder傳遞內容，因爲它會爲您提供（utf-8）編碼內容。

來源

2011-04-30 00:47:45

你可以使用一些代理類編碼爲需要的字典值，如：

# -*- coding: utf-8 -*- 
import csv 
d = {'a':123,'b':456, 'c':u'Non-ASCII: проверка'} 

class DictUnicodeProxy(object): 
    def __init__(self, d): 
     self.d = d 
    def __iter__(self): 
     return self.d.__iter__() 
    def get(self, item, default=None): 
     i = self.d.get(item, default) 
     if isinstance(i, unicode): 
      return i.encode('utf-8') 
     return i 

with open('some.csv', 'wb') as f: 
    writer = csv.DictWriter(f, ['a', 'b', 'c']) 
    writer.writerow(DictUnicodeProxy(d))

來源

2011-04-30 01:06:00

您可以將值轉換爲UTF-8的飛行。當你穿過字典內DictWriter.writerow()。例如：

import csv 

rows = [ 
    {'name': u'Anton\xedn Dvo\u0159\xe1k','country': u'\u010cesko'}, 
    {'name': u'Bj\xf6rk Gu\xf0mundsd\xf3ttir', 'country': u'\xcdsland'}, 
    {'name': u'S\xf8ren Kierkeg\xe5rd', 'country': u'Danmark'} 
    ] 

# implement this wrapper on 2.6 or lower if you need to output a header 
class DictWriterEx(csv.DictWriter): 
    def writeheader(self): 
     header = dict(zip(self.fieldnames, self.fieldnames)) 
     self.writerow(header) 

out = open('foo.csv', 'wb') 
writer = DictWriterEx(out, fieldnames=['name','country']) 
# DictWriter.writeheader() was added in 2.7 (use class above for <= 2.6) 
writer.writeheader() 
for row in rows: 
    writer.writerow(dict((k, v.encode('utf-8')) for k, v in row.iteritems())) 
out.close()

輸出foo.csv：

name,country 
Antonín Dvořák,Česko 
Björk Guðmundsdóttir,Ísland 
Søren Kierkegård,Danmark

來源

2011-04-30 01:06:46 samplebias

不錯的一個。我喜歡實現一個內膽作家功能。 – shahjapan 2013-02-11 15:04:45

'writer.writerow（dict（（k，v.encode（'utf-8'）if type（v）is unicode else v）for k，v in row.iteritems（）））只編碼unicode字符。因爲int/list沒有unicode屬性。 – 2014-11-06 02:12:28

我的解決方案有點不同。雖然上述所有解決方案都着眼於具有Unicode兼容的字典，但我的解決方案使DictWriter與Unicode相兼容。這種方法甚至在python文檔中建議（1）。

類UTF8Recoder，UnicodeReader，UnicodeWriter取自python文檔。 UnicodeWriter-> authorow也改變了一點。

將其用作常規DictWriter/DictReader。

下面是代碼：

import csv, codecs, cStringIO 

class UTF8Recoder: 
    """ 
    Iterator that reads an encoded stream and reencodes the input to UTF-8 
    """ 
    def __init__(self, f, encoding): 
     self.reader = codecs.getreader(encoding)(f) 

    def __iter__(self): 
     return self 

    def next(self): 
     return self.reader.next().encode("utf-8") 

class UnicodeReader: 
    """ 
    A CSV reader which will iterate over lines in the CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     f = UTF8Recoder(f, encoding) 
     self.reader = csv.reader(f, dialect=dialect, **kwds) 

    def next(self): 
     row = self.reader.next() 
     return [unicode(s, "utf-8") for s in row] 

    def __iter__(self): 
     return self 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([unicode(s).encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

class UnicodeDictWriter(csv.DictWriter, object): 
    def __init__(self, f, fieldnames, restval="", extrasaction="raise", dialect="excel", *args, **kwds): 
     super(UnicodeDictWriter, self).__init__(f, fieldnames, restval="", extrasaction="raise", dialect="excel", *args, **kwds) 
     self.writer = UnicodeWriter(f, dialect, **kwds)

來源

2013-09-30 14:48:47 b1r3k

有使用妙UnicodeCSV模塊的簡單的解決方法。擁有它之後，只需更改行

import csv

到

import unicodecsv as csv

它自動地開始播放尼斯UTF-8。

注意：切換到Python 3也可以解決這個問題（謝謝jamescampbell的提示）。無論如何，這是應該做的。

來源

2016-03-07 00:06:31 rlafuente

omfg終於 - 這是一個多麼噩夢，直到這 – 2016-06-18 07:24:53

這應該是接受的答案 - 這麼簡單，像一個魅力 – 2016-10-23 16:38:04

你不再需要這樣做在Python 3.x – jamescampbell 2017-12-15 17:46:25

Python DictWriter編寫UTF-8編碼的CSV文件

回答

相關問題