我做了一個如何解決你的第一個問題的例子程序。但是你必須實現你想要比較圖像,音頻和視頻。我假設每個對象的所有列表都具有相同的長度。要回答你的問題二,它會是類似的東西,但有一個雙重循環。
import numpy as np
from random import randint
class Thing:
def __init__(self, words, images, audios, videos):
self.words = words
self.images = images
self.audios = audios
self.videos = videos
def compare(self, other):
score = 0
# Assuming the attribute lists have the same length for both objects
# and that they are sorted in the same manner:
for i in range(len(self.words)):
if self.words[i] == other.words[i]:
score += 1
for i in range(len(self.images)):
if self.images[i] == other.images[i]:
score += 1
# And so one for audio and video. You have to make sure you know
# what method to use for determining when an image/audio/video are
# equal.
return score
N = 1000
things = []
words = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
things.append(Thing(words[i], images[i], audios[i], videos[i]))
# I will assume that object number 999 (i=999) is the Ox:
ox = 999
scores = np.zeros(N - 1)
for i in xrange(N - 1):
scores[i] = (things[ox].compare(things[i]))
best = np.argmax(scores)
print "The most similar thing is thing number %d." % best
print
print "Ox attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
print
print "Best match attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
編輯:
現在,這裏是sligthly修改回答你的第二個問題相同的程序。結果非常簡單。我基本上只需要添加4行:
- 將
scores
更改爲(N,N)數組而不是(N)。
- 添加
for j in xrange(N):
,從而創建一個雙循環。
if i == j:
break
其中3和4,只是爲了確保我只是比較每對事物一次,而不是兩次,並沒有任何compary事情本身。
然後還有幾行代碼需要提取scores
中5個最大值的索引。我還對打印件進行了重新格式化,以便通過眼睛很容易確認打印對實際上是非常相似的。
來了新的代碼:
import numpy as np
class Thing:
def __init__(self, words, images, audios, videos):
self.words = words
self.images = images
self.audios = audios
self.videos = videos
def compare(self, other):
score = 0
# Assuming the attribute lists have the same length for both objects
# and that they are sorted in the same manner:
for i in range(len(self.words)):
if self.words[i] == other.words[i]:
score += 1
for i in range(len(self.images)):
if self.images[i] == other.images[i]:
score += 1
for i in range(len(self.audios)):
if self.audios[i] == other.audios[i]:
score += 1
for i in range(len(self.videos)):
if self.videos[i] == other.videos[i]:
score += 1
# You have to make sure you know what method to use for determining
# when an image/audio/video are equal.
return score
N = 1000
things = []
words = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
things.append(Thing(words[i], images[i], audios[i], videos[i]))
################################################################################
############################# This is the new part: ############################
################################################################################
scores = np.zeros((N, N))
# Scores will become a triangular matrix where scores[i, j]=value means that
# value is the number of attrributes thing[i] and thing[j] have in common.
for i in xrange(N):
for j in xrange(N):
if i == j:
break
# Break the loop here because:
# * When i==j we would compare thing[i] with itself, and we don't
# want that.
# * For every combination where j>i we would repeat all the
# comparisons for j<i and create duplicates. We don't want that.
scores[i, j] = (things[i].compare(things[j]))
# I want the 5 most similar pairs:
n = 5
# This list will contain a tuple for each of the n most similar pairs:
best_list = []
for k in xrange(n):
ij = np.argmax(scores) # Returns a single integer: ij = i*n + j
i = ij/N
j = ij % N
best_list.append((i, j))
# Erease this score so that on next iteration the second largest score
# is found:
scores[i, j] = 0
for k, (i, j) in enumerate(best_list):
# The number 1 most similar pair is the BEST match of all.
# The number N most similar pair is the WORST match of all.
print "The number %d most similar pair is thing number %d and %d." \
% (k+1, i, j)
print "Thing%4d:" % i, \
things[i].words, things[i].images, things[i].audios, things[i].videos
print "Thing%4d:" % j, \
things[j].words, things[j].images, things[j].audios, things[j].videos
print
什麼時候有兩個共同的圖像? – 2014-10-10 12:59:34
當他們有使用海明距離類似的魯棒哈希。 – schoon 2014-10-10 13:12:09