2011-04-30 256 views
41
  1. 我有一個包含unicode字符串的字典列表。
  2. csv.DictWriter可以將字典列表寫入CSV文件。
  3. 我想CSV文件以UTF8編碼。
  4. csv模塊無法處理將unicode字符串轉換爲UTF8。
  5. csv模塊文檔具有用於一切轉換爲UTF8一個例子:Python DictWriter編寫UTF-8編碼的CSV文件

    def utf_8_encoder(unicode_csv_data): 
        for line in unicode_csv_data: 
         yield line.encode('utf-8') 
    
  6. 它也有一個UnicodeWriter類。

但是...如何使DictWriter與這些工作?難道他們不得不在自己的中間注入自己,在將它們寫入文件之前趕上反彙編的字典並對它們進行編碼?我不明白。

回答

71

如果使用Python 2.7或更高版本,使用一個字典理解重新映射字典爲UTF-8傳遞到DictWriter之前:

# coding: utf-8 
import csv 
D = {'name':u'馬克','pinyin':u'mǎkè'} 
f = open('out.csv','wb') 
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) 
w = csv.DictWriter(f,sorted(D.keys())) 
w.writeheader() 
w.writerow({k:v.encode('utf8') for k,v in D.items()}) 
f.close() 

你可以使用這個想法來更新UnicodeWriter到DictUnicodeWriter:

# coding: utf-8 
import csv 
import cStringIO 
import codecs 

class DictUnicodeWriter(object): 

    def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, D): 
     self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()}) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for D in rows: 
      self.writerow(D) 

    def writeheader(self): 
     self.writer.writeheader() 

D1 = {'name':u'馬克','pinyin':u'Mǎkè'} 
D2 = {'name':u'美國','pinyin':u'Měiguó'} 
f = open('out.csv','wb') 
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly) 
w = DictUnicodeWriter(f,sorted(D.keys())) 
w.writeheader() 
w.writerows([D1,D2]) 
f.close() 
+0

我認爲降級到Python(x,y)2.6.6.0會讓事情變得更簡單。 :) – endolith 2011-04-30 01:50:41

+9

@endolith:你可以使用'dict((k,v.encode('utf-8')if isinstance(v,unicode)else v)for k,v in D.iteritems())''而不是dict理解Python 2.6。 – jfs 2011-04-30 05:37:38

+4

'if isinstance(v,unicode)'部分是必不可少的! – reubano 2014-03-06 07:42:43

2

當您將csv.writer與您的內容聯繫起來時,其想法是通過utf_8_encoder傳遞內容,因爲它會爲您提供(utf-8)編碼內容。

6

你可以使用一些代理類編碼爲需要的字典值,如:

# -*- coding: utf-8 -*- 
import csv 
d = {'a':123,'b':456, 'c':u'Non-ASCII: проверка'} 

class DictUnicodeProxy(object): 
    def __init__(self, d): 
     self.d = d 
    def __iter__(self): 
     return self.d.__iter__() 
    def get(self, item, default=None): 
     i = self.d.get(item, default) 
     if isinstance(i, unicode): 
      return i.encode('utf-8') 
     return i 

with open('some.csv', 'wb') as f: 
    writer = csv.DictWriter(f, ['a', 'b', 'c']) 
    writer.writerow(DictUnicodeProxy(d)) 
14

您可以將值轉換爲UTF-8的飛行。當你穿過字典內DictWriter.writerow()。例如:

import csv 

rows = [ 
    {'name': u'Anton\xedn Dvo\u0159\xe1k','country': u'\u010cesko'}, 
    {'name': u'Bj\xf6rk Gu\xf0mundsd\xf3ttir', 'country': u'\xcdsland'}, 
    {'name': u'S\xf8ren Kierkeg\xe5rd', 'country': u'Danmark'} 
    ] 

# implement this wrapper on 2.6 or lower if you need to output a header 
class DictWriterEx(csv.DictWriter): 
    def writeheader(self): 
     header = dict(zip(self.fieldnames, self.fieldnames)) 
     self.writerow(header) 

out = open('foo.csv', 'wb') 
writer = DictWriterEx(out, fieldnames=['name','country']) 
# DictWriter.writeheader() was added in 2.7 (use class above for <= 2.6) 
writer.writeheader() 
for row in rows: 
    writer.writerow(dict((k, v.encode('utf-8')) for k, v in row.iteritems())) 
out.close() 

輸出foo.csv

name,country 
Antonín Dvořák,Česko 
Björk Guðmundsdóttir,Ísland 
Søren Kierkegård,Danmark 
+0

不錯的一個。我喜歡實現一個內膽作家功能。 – shahjapan 2013-02-11 15:04:45

+6

'writer.writerow(dict((k,v.encode('utf-8')if type(v)is unicode else v)for k,v in row.iteritems())) 只編碼unicode字符。因爲int/list沒有unicode屬性。 – 2014-11-06 02:12:28

1

我的解決方案有點不同。雖然上述所有解決方案都着眼於具有Unicode兼容的字典,但我的解決方案使DictWriter與Unicode相兼容​​。這種方法甚至在python文檔中建議(1)。

類UTF8Recoder,UnicodeReader,UnicodeWriter取自python文檔。 UnicodeWriter-> authorow也改變了一點。

將其用作常規DictWriter/DictReader。

下面是代碼:

import csv, codecs, cStringIO 

class UTF8Recoder: 
    """ 
    Iterator that reads an encoded stream and reencodes the input to UTF-8 
    """ 
    def __init__(self, f, encoding): 
     self.reader = codecs.getreader(encoding)(f) 

    def __iter__(self): 
     return self 

    def next(self): 
     return self.reader.next().encode("utf-8") 

class UnicodeReader: 
    """ 
    A CSV reader which will iterate over lines in the CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     f = UTF8Recoder(f, encoding) 
     self.reader = csv.reader(f, dialect=dialect, **kwds) 

    def next(self): 
     row = self.reader.next() 
     return [unicode(s, "utf-8") for s in row] 

    def __iter__(self): 
     return self 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([unicode(s).encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

class UnicodeDictWriter(csv.DictWriter, object): 
    def __init__(self, f, fieldnames, restval="", extrasaction="raise", dialect="excel", *args, **kwds): 
     super(UnicodeDictWriter, self).__init__(f, fieldnames, restval="", extrasaction="raise", dialect="excel", *args, **kwds) 
     self.writer = UnicodeWriter(f, dialect, **kwds) 
31

有使用妙UnicodeCSV模塊的簡單的解決方法。擁有它之後,只需更改行

import csv 

import unicodecsv as csv 

它自動地開始播放尼斯UTF-8。

注意:切換到Python 3也可以解決這個問題(謝謝jamescampbell的提示)。無論如何,這是應該做的。

+4

omfg終於 - 這是一個多麼噩夢,直到​​這 – 2016-06-18 07:24:53

+3

這應該是接受的答案 - 這麼簡單,像一個魅力 – 2016-10-23 16:38:04

+1

你不再需要這樣做在Python 3.x – jamescampbell 2017-12-15 17:46:25