通過指紋圖像比較

我正在尋找通過指紋查找圖像副本的方法。我明白這是通過在圖像上應用散列函數完成的，每個圖像都有一個唯一的散列值。通過指紋圖像比較

我對圖像處理相當陌生，對散列也不太瞭解。我應該如何應用哈希函數並生成哈希值？

在此先感謝

2016-10-15 Humza

您需要注意散列，一些圖像格式（如JPEG和PNG），在圖像中存儲日期/時間和其他信息，這會使兩個相同的圖像看起來不同於普通工具，如md5和cksum 。

這裏是一個例子。讓兩個圖像，無論是在終端的命令行的128×128相同的紅色方塊與ImageMagick的

convert -size 128x128 xc:red a.png 
convert -size 128x128 xc:red b.png

現在檢查他們的MD5校驗：

md5 [ab].png 
MD5 (a.png) = b4b82ba217f0b36e6d3ba1722f883e59 
MD5 (b.png) = 6aa398d3aaf026c597063c5b71b8bd1a

或者其校驗：

cksum [ab].png 
4158429075 290 a.png 
3657683960 290 b.png

糟糕，它們根據md5和cksum而不同。爲什麼？因爲日期相隔1秒。

我會建議你使用ImageMagick的來校驗「只是圖像數據」而不是元數據 - 除非，當然，日期是重要的是你：

identify -format %# a.png 
e74164f4bab2dd8f7f612f8d2d77df17106bac77b9566aa888d31499e9cf8564 

identify -format %# b.png 
e74164f4bab2dd8f7f612f8d2d77df17106bac77b9566aa888d31499e9cf8564

現在他們都相同，因爲圖像是相同的 - 只是元數據不同。

當然，你可能會更感興趣「感知哈希」，你只得到一個想法，如果兩個圖像「類似於」。如果是這樣，看看here。

或者您可能有興趣允許在亮度，方向或裁剪方面稍有不同 - 這完全是另一個話題。

來源

2016-10-15 11:46:29

有很多，你可以做到這一點，但最簡單的是將圖像轉換成一個base64字符串，然後使用標準的散列庫的方式。在蟒蛇它看起來像：

import base64 
import md5 



with open("foo.png", "rb") as image_file: 
    encoded_string = base64.b64encode(image_file.read()) 
    m = md5.new() 
    m.update(encoded_string) 
    fingerprint = m.hexdigest() 
    print(fingerprint)

如果你只是想一個散列函數車削一個（可能很大）串到另一個你應該沒事的。在上面的代碼中，m.update（）只是通過調用m.hexdigest（）將encoded_string（一個非常大的base64字符串）轉換爲一個較小的十六進制字符串。

您可以閱讀md5庫的python文檔here，但是您使用的任何語言應該有類似的東西。

來源

2016-10-15 10:28:03

如果您有興趣在附近找到重複項，其中包括已調整大小的圖像，則可以應用差異散列。更多關於哈希here。下面的代碼是從Real Python博客文章進行編輯，使其在Python 3中工作。它使用哈希庫鏈接到上面，有不同種類的散列信息。您應該能夠複製和粘貼腳本，並直接從命令行運行它們而無需編輯腳本。

這第一個腳本（index.py）會爲每個圖像的差異哈希，然後把散在貨架，或持續字典，您可以用圖像文件名（s）表示，有後來訪問諸如數據庫，一起哈希：在命令行上

from PIL import Image import imagehash import argparse import shelve import glob # This is just so you can run it from the command line ap = argparse.ArgumentParser() ap.add_argument('-d', '--dataset', required = True, help = 'path to imput dataset of images') ap.add_argument('-s', '--shelve', required = True, help = 'output shelve database') args = ap.parse_args() # open the shelve database db = shelve.open(args.shelve, writeback = True) # loop over the image dataset for imagePath in glob.glob(args.dataset + '/*.jpg'): # load the image and compute the difference in hash image = Image.open(imagePath) h = str(imagehash.dhash(image)) print(h) # extract the filename from the path and update the database using the hash # as the key and the filename append to the list of values filename = imagePath[imagePath.rfind('/') + 1:] db[h] = db.get(h, []) + [filename] db.close()

運行：

python index.py --dataset ./image_directory --shelve db.shelve

運行在Jupyter筆記本

%run index.py --dataset ./image_directory --shelve db.shelve

現在一切都存儲在一個架子上，您可以查詢與您想檢查的圖像文件名的架子，它會打印出匹配的圖像的文件名稱，並打開匹配的圖像（search.py）：
上的命令行
from PIL import Image import imagehash import argparse import shelve # arguments for command line ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to dataset of images") ap.add_argument("-s", "--shelve", required=True, help="output the shelve database") ap.add_argument("-q", "--query", required=True, help="path to the query image") args = ap.parse_args() # open the shelve database db = shelve.open(args.shelve) # Load the query image, compute the difference image hash, and grab the images # from the database that have the same hash value query = Image.open(args.query) h = str(imagehash.dhash(query)) filenames = db[h] print("found {} images".format(len(filenames))) # loop over the images for filename in filenames: print(filename) image = Image.open(args.dataset + "/" + filename) image.show() # close the shelve database db.close()

運行通過image_directory尋找具有相同散列圖像作爲./directory/someimage.jpg

python search.py —dataset ./image_directory —shelve db.shelve —query ./directory/someimage.jpg

再次，這是從Real Python博客帖子上面鏈接，這是爲python2.7寫入修改，並且應該解決這個問題！只需根據需要更改命令行即可。如果我沒有記錯，python 2/3問題只是與而不是圖像庫。

來源

2017-05-04 01:35:22 StarkJA

通過指紋圖像比較

回答

相關問題