2015-09-04 88 views
3

我試圖創建一個沒有標題的重複CSV。當我嘗試這一點,我得到以下錯誤:在Python中將CSV轉換爲UTF-8

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte. 

我讀過UnicodeUTF-8編碼蟒蛇CSVdocumentation,並已付諸實施。 但是,我的輸出文件是在沒有數據的情況下生成的。不知道我在這裏做錯了什麼。

import csv 

path = '/Users/johndoe/file.csv' 

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile: 

    def unicode_csv(infile, outfile): 
     inputs = csv.reader(utf_8_encoder(infile)) 
     output = csv.writer(outfile) 

     for index, row in enumerate(inputs): 
      yield [unicode(cell, 'utf-8') for cell in row] 
      if index == 0: 
       continue 
     output.writerow(row) 

    def utf_8_encoder(infile): 
     for line in infile: 
      yield line.encode('utf-8') 

unicode_csv(infile, outfile) 

回答

5

的解決方案是簡單地包括兩個額外的參數到

with open(path, 'r') as infile: 

的兩個參數是encoding ='UTF-8'和errors ='ignore'。這使我可以創建一個沒有標題且沒有UnicodeDecodeError的原始CSV副本。以下是完整的代碼。

import csv 

path = '/Users/johndoe/file.csv' 

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile: 
    inputs = csv.reader(infile) 
    output = csv.writer(outfile) 

    for index, row in enumerate(inputs): 
     # Create file with no header 
     if index == 0: 
      continue 
     output.writerow(row) 
2

由於線

unicode_csv(infile,outfile) 

不是縮進,它是從with命令的範圍的,而當它被調用,然後INFILE和OUTFILE都關閉。

該文件應當它們用於被打開,而不是被定義的功能時,這樣有:

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile: 
    unicode_csv(infile,outfile)