對於除地區以外的任何東西,您都可以使用numpy。像這樣的(未經測試)的東西:
import numpy as np
a = np.fromfile("file A", dtype="uint8")
b = np.fromfile("file B", dtype="uint8")
# Compute the number of bytes that are different
different_bytes = np.sum(a != b)
# Compute the sum of the differences
difference = np.sum(a - b)
# Compute the sum of the absolute value of the differences
absolute_difference = np.sum(np.abs(a - b))
# In some cases, the number of bits that have changed is a better
# measurement of change. To compute it we make a lookup array where
# bitcount_lookup[byte] == number_of_1_bits_in_byte (so
# bitcount_lookup[0:16] == [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4])
bitcount_lookup = np.array(
[bin(i).count("1") for i in range(256)], dtype="uint8")
# Numpy allows using an array as an index.^computes the XOR of
# each pair of bytes. The result is a byte with a 1 bit where the
# bits of the input differed, and a 0 bit otherwise.
bit_diff_count = np.sum(bitcount_lookup[a^b])
我找不到用於計算一個地區的功能numpy的,但只寫你自己的使用a != b
作爲輸入,它不應該是很難。見this問題的靈感。
你在linux中已經有'cmp'程序出了什麼問題? http://en.wikipedia.org/wiki/Cmp_(Unix)。另外,如果文件大小不同(或者比較區域大小不同),結果如何? –
您對「累計差異」的定義意味着'00 ff'和'ff 00'之間的累計差值爲0.是否打算這麼做? –
作爲一個更一般的說明:沒有指定目的,要求其他「差異措施」是沒有意義的。您可以將這兩個文件視爲向量(字節)並使用任何有限維矢量範數。 –