我無法讀取文件，因爲我收到「UnicodeDecodeError：'utf-8'編解碼器無法解碼」錯誤

我有一個文件並希望將其轉換爲utf8編碼。我無法讀取文件，因爲我收到「UnicodeDecodeError：'utf-8'編解碼器無法解碼」錯誤

當我想讀，我收到此錯誤：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte

我的目的是要讀它，然後將其轉換爲utf8編碼格式，但它不允許閱讀。

這裏是我的代碼：

#convert all files into utf_8 format 
import os 
import io 
path_directory="some path string" 
directory = os.fsencode(path_directory) 
for file in os.listdir(directory): 
    file_name=os.fsdecode(file) 
    file_path_source=path_directory+file_name 
    file_path_dest="some address to destination file" 
    with open(file_path_source,"r") as f1: 
     text=f1.read() 
    with io.open(file_path_dest,"w+",encoding='utf8') as f2: 
     f2.write(text) 
    file_path="" 
    file_name="" 
    text=None

和錯誤是：

--------------------------------------------------------------------------- 
UnicodeDecodeError      Traceback (most recent call last) 
<ipython-input-47-59e5e52ddd40> in <module>() 
    10  with open(file_path,"r") as f1: 
    11   print(type(f1)) 
---> 12   text=f1.read() 
    13  with io.open(file_path.replace("ref_sum","ref_sum_utf_8"),"w+",encoding='utf8') as f2: 
    14   f2.write(text) 

/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py in decode(self, input, final) 
    319   # decode input (taking the buffer into account) 
    320   data = self.buffer + input 
--> 321   (result, consumed) = self._buffer_decode(data, self.errors, final) 
    322   # keep undecoded input until the next call 
    323   self.buffer = data[consumed:] 

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte

如何將我的文件轉換爲utf8不讀呢？

來源

2017-08-28 mahsa

這種情況經常出現，僅僅因爲搜索量過多而變得很難搜索。你告訴Python它已經* utf-8了，這不是真的，所以解碼失敗。 –

文件是否包含utf標頭。 '＃ - * - coding：utf-8 - * - '在文件的開頭。 – 0decimal0

@ 0decimal0不，它沒有。 – mahsa

這很明顯。如果你想打開一個文件，併爲python3它不是UTF8（UTF8是python3和python2ASCII默認編碼），那麼你就不得不提到您知道文件是在同時打開它的編碼：

io.open(file_path_dest,"r",encoding='ISO-8859-1')

在這種情況下，編碼是ISO-8859-1所以就不得不提到它。

來源

2017-08-28 08:39:12 0decimal0

我無法讀取文件，因爲我收到「UnicodeDecodeError：'utf-8'編解碼器無法解碼」錯誤

回答

相關問題