我想做一個python腳本來查找usb閃存驅動器中的重複文件。爲什麼我在這裏得到UnicodeDecodeError?
我正在執行的過程是創建一個文件名列表,散列每個文件,然後創建一個反向字典。然而,在某些地方,我得到了UnicodeDecodeError
。有人能幫我理解發生了什麼嗎?
from os import listdir
from os.path import isfile, join
from collections import defaultdict
import hashlib
my_path = r"F:/"
files_in_dir = [ file for file in listdir(my_path) if isfile(join(my_path, file)) ]
file_hashes = dict()
for file in files_in_dir:
file_hashes[file] = hashlib.md5(open(join(my_path, file), 'r').read()).digest()
inverse_dict = defaultdict(list)
for file, file_hash in file_hashes.iteritems():
inverse_dict[file_hash].append(file)
inverse_dict.items()
的錯誤,我面對的是:
Traceback (most recent call last):
File "C:\Users\Fotis\Desktop\check_dup.py", line 12, in <module>
file_hashes[file] = hashlib.md5(open(join(my_path, file), 'r').read()).digest()
File "C:\Python33\lib\encodings\cp1253.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0xff in position 2227: character maps to <undefined>
@Martijn彼得這是python 3.我會重新適當地提出這個問題。 – NlightNFotis