2011-06-23 16 views
0

下午,我遇到一些麻煩一個SQLite到CSV python腳本的Python - SQLite的到CSV作家錯誤 - ASCII值不分析

。我搜索的很高,我搜索了很低的答案,但沒有人爲我工作,或者我的語法有問題。

我想替換SQLite數據庫內的ASCII表(大於128)之外的字符。

這是我一直在使用這個腳本:

#!/opt/local/bin/python 
import sqlite3 
import csv, codecs, cStringIO 

class UnicodeWriter: 
    """ 
    A CSV writer which will write rows to CSV file "f", 
    which is encoded in the given encoding. 
    """ 

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 
     # Redirect output to a queue 
     self.queue = cStringIO.StringIO() 
     self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 
     self.stream = f 
     self.encoder = codecs.getincrementalencoder(encoding)() 

    def writerow(self, row): 
     self.writer.writerow([unicode(s).encode("utf-8") for s in row]) 
     # Fetch UTF-8 output from the queue ... 
     data = self.queue.getvalue() 
     data = data.decode("utf-8") 
     # ... and reencode it into the target encoding 
     data = self.encoder.encode(data) 
     # write to the target stream 
     self.stream.write(data) 
     # empty queue 
     self.queue.truncate(0) 

    def writerows(self, rows): 
     for row in rows: 
      self.writerow(row) 

conn = sqlite3.connect('test.db') 

c = conn.cursor() 

# Select whichever rows you want in whatever order you like 
c.execute('select ROWID, Name, Type, PID from PID') 

writer = UnicodeWriter(open("ProductListing.csv", "wb")) 

# Make sure the list of column headers you pass in are in the same order as your SELECT 
writer.writerow(["ROWID", "Product Name", "Product Type", "PID", ]) 
writer.writerows(c) 

我試圖加入「替代」作爲在此間表示,但已經得到了同樣的錯誤。 Python: Convert Unicode to ASCII without errors for CSV file

錯誤是UnicodeDecodeError。

Traceback (most recent call last): 
    File "SQLite2CSV1.py", line 53, in <module> 
    writer.writerows(c) 
    File "SQLite2CSV1.py", line 32, in writerows 
    self.writerow(row) 
    File "SQLite2CSV1.py", line 19, in writerow 
    self.writer.writerow([unicode(s).encode("utf-8") for s in row]) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 65: ordinal not in range(128) 

顯然我想要的代碼足夠強大,如果它遇到這些範圍之外的字符,它與一個字符,如將其替換「?」 (\ X3F)。

有沒有辦法在UnicodeWriter類中做到這一點?而且我可以使代碼健壯,不會產生這些錯誤。

非常感謝您的幫助。

回答

1

如果你只是想寫一個ASCII CSV,只需使用股票csv.writer()。要確保所有傳遞的值都是ASCII,請使用encode('ascii', errors='replace')

例子:

import csv 

rows = [ 
    [u'some', u'other', u'more'], 
    [u'umlaut:\u00fd', u'euro sign:\u20ac', ''] 
] 

with open('/tmp/test.csv', 'wb') as csvFile: 
    writer = csv.writer(csvFile) 
    for row in rows: 
     asciifiedRow = [item.encode('ascii', errors='replace') for item in row] 
     print '%r --> %r' % (row, asciifiedRow) 
     writer.writerow(asciifiedRow) 

此控制檯輸出爲:

[u'some', u'other', u'more'] --> ['some', 'other', 'more'] 
[u'umlaut:\xfd', u'euro sign:\u20ac', ''] --> ['umlaut:?', 'euro sign:?', ''] 

產生的CSV文件包含:

some,other,more 
umlaut:?,euro sign:?, 
+0

+1點上。另請注意,UTF-8不是ASCII碼,因此試圖將UTF-8字符串提供給期望ASCII的函數通常會產生歡鬧的意外結果(其中「UnicodeEncodeError」最爲明顯 - 某些效果更爲微妙。) –

0

通過訪問UNIX環境中,這裏是什麼工作對我來說

sqlite3.exe a.db .dump > a.sql; 
tr -d "[\\200-\\377]" <a.sql> clean.sql; 
sqlite3.exe clean.db < clean.sql; 

(這不是一個python解決方案,但也許它會幫助其他人,因爲它的簡潔。此解決方案STRIPS OUT所有非ASCII字符,不會嘗試替換它們。)