2014-01-31 35 views
4

我無法讀取一個Unicode字符串CSV進入蟒蛇,unicodescv:從unicode字符串unicodecsv讀取器不工作?

>>> import unicodecsv, StringIO 
>>> f = StringIO.StringIO(u'é,é') 
>>> r = unicodecsv.reader(f, encoding='utf-8') 
>>> row = r.next() 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next 
    row = self.reader.next() 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128) 

我猜它與如何將我的unicode字符串到一個莫名其妙StringIO的文件存在問題?蟒蛇-unicodecsv github上頁的示例工作正常:

>>> import unicodecsv 
>>> from cStringIO import StringIO 
>>> f = StringIO() 
>>> w = unicodecsv.writer(f, encoding='utf-8') 
>>> w.writerow((u'é', u'ñ')) 
>>> f.seek(0) 
>>> r = unicodecsv.reader(f, encoding='utf-8') 
>>> row = r.next() 
>>> print row[0], row[1] 
é ñ 

Trying,作爲cStringIO不能接受的Unicode我用cStringIO代碼失敗

>>> from cStringIO import StringIO 
>>> f = StringIO(u'é') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128) 
(所以爲什麼示例工作,我不知道!)

我需要接受來自web textarea表單字段的UTF-8 CSV格式的輸入,因此不能只從文件讀入。

任何想法?

回答

7

unicodecsv文件爲您讀取並解碼字節字符串。您正在傳遞它unicode字符串。在輸出時,使用配置的編解碼器,您的unicode值將被編碼爲字節。

此外,cStringIO.StringIO只能處理編碼的字節串,而純python StringIO.StringIO類愉快地將unicode值視爲它們是字節串。

的解決方案是編碼您的Unicode值將它們放入了StringIO對象之前:

>>> import unicodecsv, StringIO, cStringIO 
>>> f = StringIO.StringIO(u'é,é'.encode('utf8')) 
>>> r = unicodecsv.reader(f, encoding='utf-8') 
>>> next(r) 
[u'\xe9', u'\xe9'] 
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8')) 
>>> r = unicodecsv.reader(f, encoding='utf-8') 
>>> next(r) 
[u'\xe9', u'\xe9'] 
+0

優秀。很好的答案和快速。總得愛:) –