見see if two files have the same content in python
爲了比較,你可以使用filecmp模塊(http://docs.python.org/library/filecmp.html):
>>> import filecmp
>>> filecmp.cmp('F1.txt, 'F2.txt')
True
>>> filecmp.cmp('F1.txt', 'F3.txt')
False
所以一個解決這將是這樣(根本不算高雅,但確實有效):
import filecmp
files = ['F1.txt', 'F2.txt', 'F3.txt', 'F4.txt', 'F5.txt']
comparisons = {}
for itm in range(len(files)):
try:
res = filecmp.cmp(files[itm], files[itm+1])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+1])] = res
except:
pass
try:
res = filecmp.cmp(files[itm], files[itm+2])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+2])] = res
except:
pass
try:
res = filecmp.cmp(files[itm], files[itm+3])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+3])] = res
except:
pass
try:
res = filecmp.cmp(files[itm], files[itm+4])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+4])] = res
except:
pass
print(comparisons)
給出:
{'F1.txt vs F2.txt': True, 'F1.txt vs F5.txt': False, 'F2.txt vs F4.txt': True,
'F3.txt vs F4.txt': False, 'F1.txt vs F4.txt': True, 'F2.txt vs F3.txt': False,
'F2.txt vs F5.txt': False, 'F1.txt vs F3.txt': False, 'F3.txt vs F5.txt': False,
'F4.txt vs F5.txt': False}
至於你的問題的另一部分,你可以使用內置的shutil
和os
模塊,像這樣:
import shutil
import os
if filecmp.cmp('F1.txt', 'F2.txt') is True:
shutil.move(os.path.abspath('F1.txt'), 'C:\\example\\path')
shutil.move(os.path.abspath('F2.txt'), 'C:\\example\\path')
更新:更好的答案,從@修改zalew的回答:https://stackoverflow.com/a/748879/5247482
import shutil
import os
import hashlib
def remove_duplicates(dir):
unique = []
for filename in os.listdir(dir):
if os.path.isfile(dir+'\\'+filename):
print('--Checking ' + dir+'\\'+filename)
filehash = hashlib.md5(filename.encode('utf8')).hexdigest()
print(filename, ' has hash: ', filehash)
if filehash not in unique:
unique.append(filehash)
else:
shutil.move(os.path.abspath(filename), 'C:\\example\\path\\destinationfolder')
return
remove_duplicates('C:\\example\\path\\sourcefolder')
你應該發佈你的嘗試代碼,是的這是可能的我蟒蛇。 –
您可以計算文件的散列並僅比較散列值。您可能想向我們展示您花在解決問題上的努力。 –
我試過以下代碼 file1 = open(「F1.txt」,「r」) file2 = open(「F2.txt」,「r」) file3 = open(「F3.txt」,「r」) file4將開放=( 「F4.txt」, 「R」) file5 =打開( 「F5.txt」, 「R」) list1的= file1.readlines() 列表2 = file2.readlines() 項目list3 = file3.readlines() list4 = file4.readlines() list5 = file5.readlines() 用於list1的LINE1: 在list2中LINE2: 在項目list3 line3中: 用於list4 line3中: 爲4號線在list5: 如果line1.strip ()in line5.strip()in line5.strip()in line5.strip(): print line1 file3.write(line1) – Manuj