修復UnicodeDecodeError

我有以下代碼。我使用Python 2.7修復UnicodeDecodeError

import csv 
import sqlite3 

conn = sqlite3.connect('torrents.db') 
c = conn.cursor() 

# Create table 
c.execute('''DROP TABLE torrents''') 
c.execute('''CREATE TABLE IF NOT EXISTS torrents 
      (name text, size long, info_hash text, downloads_count long, 
      category_id text, seeders long, leechers long)''') 

with open('torrents_mini.csv', 'rb') as csvfile: 
    spamreader = csv.reader(csvfile, delimiter='|') 
    for row in spamreader: 
     name = unicode(row[0]) 
     size = row[1] 
     info_hash = unicode(row[2]) 
     downloads_count = row[3] 
     category_id = unicode(row[4]) 
     seeders = row[5] 
     leechers = row[6] 
     c.execute('INSERT INTO torrents (name, size, info_hash, downloads_count, 
        category_id, seeders, leechers) VALUES (?,?,?,?,?,?,?)', 
        (name, size, info_hash, downloads_count, category_id, seeders, leechers)) 

conn.commit() 
conn.close()

我收到的錯誤消息是

Traceback (most recent call last): 
    File "db.py", line 15, in <module> 
    name = unicode(row[0]) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)

如果我不轉換成Unicode，然後我得到的錯誤是

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

加入name = row[0].decode('UTF-8')給了我另一個錯誤

Traceback (most recent call last): 
    File "db.py", line 27, in <module> 
    for row in spamreader: 
_csv.Error: line contains NULL byte

個包含在CSV文件中的數據是按以下格式

Tha Twilight New Moon DVDrip 2009 XviD-AMiABLE|694554360|2cae2fc76d110f35917d5d069282afd8335bc306|0|movies|0|1

編輯：我終於放棄了嘗試，並實現使用sqlite3的命令行工具（這是很容易的）任務。我還不知道是什麼導致了錯誤，但是當sqlite3導入所述csv文件時，它一直彈出關於「非轉義字符」的警告，該字符被引號（「）。

感謝所有試圖幫助。

來源

2015-01-15 SATW

你可以給我們**錯誤的完整回溯**。文件中使用的*編碼*是什麼？ – 2015-01-15 15:41:47

回溯是回溯（最近最後調用）：文件「db.py」，第15行，在名= unicode的（行[0]） UnicodeDecodeError錯誤： 'ASCII' 編解碼器不能解碼的位置字節0xc3 14：序號不在範圍內（128） – SATW 2015-01-15 15:43:04

您需要編輯您的文章以添加該信息。不要把它放在評論中。 – 2015-01-15 15:43:28

你的數據不被編碼爲ASCII使用正確的編解碼器爲您的數據

你可以告訴Python來使用什麼編解碼器：。

unicode(row[0], correct_codec)

或使用str.decode()方法：

row[0].decode(correct_codec)

什麼是正確的編解碼器，我們不能告訴你。你必須諮詢你得到的文件。

如果你不知道使用了什麼編碼，你可以使用像chardet這樣的包進行有根據的猜測，但要考慮到這樣的庫不能防故障。

來源

2015-01-15 15:43:11

我不知道使用的編碼。 – SATW 2015-01-15 16:06:10

修復UnicodeDecodeError

回答

相關問題