我想比較兩個文件中的數據,並檢索差異所在的偏移量列表。 我在一些文本文件上試過它,它工作得很好.. 但是在非文本文件(仍然包含ascii文本)時,我將它們稱爲二進制數據文件。 (可執行文件,等等..)我應該使用struct來比較字節嗎?
它似乎認爲一些字節是相同的,即使當我在十六進制編輯器看它時,他們顯然不是。我試着打印出它認爲是相同的這個二進制數據,並且在它應該打印的地方得到空行。 因此,我認爲這是問題的根源。
那麼什麼是最好的方式來比較可能是二進制和包含ascii文本的數據字節?我想到了用結構模塊通過是一個起點......
正如你可以看到下面,我比較==操作符
這裏字節的代碼:
import os
import math
#file1 = 'file1.txt'
#file2 = 'file2.txt'
file1 = 'file1.exe'
file2 = 'file2.exe'
file1size = os.path.getsize(file1)
file2size = os.path.getsize(file2)
a = file1size - file2size
end = file1size #if they are both same size
if a > 0:
#file 2 is smallest
end = file2size
big = file1size
elif a < 0:
#file 1 is smallest
end = file1size
big = file2size
f1 = open(file1, 'rb')
f2 = open(file2, 'rb')
readSize = 500
r = readSize
off = 0
data = []
looking = False
d = open('data.txt', 'w')
while off < end:
f1.seek(off)
f2.seek(off)
b1, b2 = f1.read(r), f2.read(r)
same = b1 == b2
print ''
if same:
print 'Same at: '+str(off)
print 'readSize: '+str(r)
print b1
print b2
print ''
#save offsets of the section of "different" bytes
#data.append([diffOff, diffOff+off-1]) #[begin diff off, end diff off]
if looking:
d.write(str(diffOff)+" => "+str(diffOff+off-2)+"\n")
looking = False
r = readSize
off = off + 1
else:
off = off + r
else:
if r == 1:
looking = True
diffOff = off
off = off + 1 #continue reading 1 at a time, until u find a same reading
r = 1 #it will shoot back to the last off, since we didn't increment it here
d.close()
f1.close()
f2.close()
#add the diff ending portion to diff data offs, if 1 file is longer than the other
a = int(math.fabs(a)) #get abs val of diff
if a:
data.append([big-a, big-1])
print data
你嘗試了嗎? – 2010-08-17 10:06:22