0

我有1,000個對象,每個對象有4個屬性列表:單詞,圖像,音頻文件和視頻文件的列表。用於匹配對象的算法

我想每個對象比較反對:

  1. 單個對象,牛,從1000。
  2. 隔一個物體。

比較將類似於: 總和(常見詞+常見圖像+ ...)。

我想要一種算法,這將幫助我找到最接近的5,也就是說,反對Ox和(不同?)算法來找到最接近的5對象對

我看着聚類分析和最大匹配,他們似乎並不完全適合這種情況。如果存在更適合的東西,我不想使用這些方法,那麼對於任何人來說,這看起來像是一種特定類型的算法,還是任何人都可以指向正確的方向來應用我提到的算法?

+0

什麼時候有兩個共同的圖像? – 2014-10-10 12:59:34

+0

當他們有使用海明距離類似的魯棒哈希。 – schoon 2014-10-10 13:12:09

回答

1

我做了一個如何解決你的第一個問題的例子程序。但是你必須實現你想要比較圖像,音頻和視頻。我假設每個對象的所有列表都具有相同的長度。要回答你的問題二,它會是類似的東西,但有一個雙重循環。

import numpy as np 
from random import randint 

class Thing: 

    def __init__(self, words, images, audios, videos): 
     self.words = words 
     self.images = images 
     self.audios = audios 
     self.videos = videos 

    def compare(self, other): 
     score = 0 
     # Assuming the attribute lists have the same length for both objects 
     # and that they are sorted in the same manner: 
     for i in range(len(self.words)): 
      if self.words[i] == other.words[i]: 
       score += 1 
     for i in range(len(self.images)): 
      if self.images[i] == other.images[i]: 
       score += 1 
     # And so one for audio and video. You have to make sure you know 
     # what method to use for determining when an image/audio/video are 
     # equal. 
     return score 


N = 1000 
things = [] 
words = np.random.randint(5, size=(N,5)) 
images = np.random.randint(5, size=(N,5)) 
audios = np.random.randint(5, size=(N,5)) 
videos = np.random.randint(5, size=(N,5)) 
# For testing purposes I assign each attribute to a list (array) containing 
# five random integers. I don't know how you actually intend to do it. 
for i in xrange(N): 
    things.append(Thing(words[i], images[i], audios[i], videos[i])) 

# I will assume that object number 999 (i=999) is the Ox: 
ox = 999 
scores = np.zeros(N - 1) 
for i in xrange(N - 1): 
    scores[i] = (things[ox].compare(things[i])) 

best = np.argmax(scores) 
print "The most similar thing is thing number %d." % best 
print 
print "Ox attributes:" 
print things[ox].words 
print things[ox].images 
print things[ox].audios 
print things[ox].videos 
print 
print "Best match attributes:" 
print things[ox].words 
print things[ox].images 
print things[ox].audios 
print things[ox].videos 

編輯:

現在,這裏是sligthly修改回答你的第二個問題相同的程序。結果非常簡單。我基本上只需要添加4行:

  1. scores更改爲(N,N)數組而不是(N)。
  2. 添加for j in xrange(N):,從而創建一個雙循環。
  3. if i == j:
  4. break

其中3和4,只是爲了確保我只是比較每對事物一次,而不是兩次,並沒有任何compary事情本身。

然後還有幾行代碼需要提取scores中5個最大值的索引。我還對打印件進行了重新格式化,以便通過眼睛很容易確認打印對實際上是非常相似的。

來了新的代碼:

import numpy as np 

class Thing: 

    def __init__(self, words, images, audios, videos): 
     self.words = words 
     self.images = images 
     self.audios = audios 
     self.videos = videos 

    def compare(self, other): 
     score = 0 
     # Assuming the attribute lists have the same length for both objects 
     # and that they are sorted in the same manner: 
     for i in range(len(self.words)): 
      if self.words[i] == other.words[i]: 
       score += 1 
     for i in range(len(self.images)): 
      if self.images[i] == other.images[i]: 
       score += 1 
     for i in range(len(self.audios)): 
      if self.audios[i] == other.audios[i]: 
       score += 1 
     for i in range(len(self.videos)): 
      if self.videos[i] == other.videos[i]: 
       score += 1 
     # You have to make sure you know what method to use for determining 
     # when an image/audio/video are equal. 
     return score 


N = 1000 
things = [] 
words = np.random.randint(5, size=(N,5)) 
images = np.random.randint(5, size=(N,5)) 
audios = np.random.randint(5, size=(N,5)) 
videos = np.random.randint(5, size=(N,5)) 
# For testing purposes I assign each attribute to a list (array) containing 
# five random integers. I don't know how you actually intend to do it. 
for i in xrange(N): 
    things.append(Thing(words[i], images[i], audios[i], videos[i])) 


################################################################################ 
############################# This is the new part: ############################ 
################################################################################ 
scores = np.zeros((N, N)) 
# Scores will become a triangular matrix where scores[i, j]=value means that 
# value is the number of attrributes thing[i] and thing[j] have in common. 
for i in xrange(N): 
    for j in xrange(N): 
     if i == j: 
      break 
      # Break the loop here because: 
      # * When i==j we would compare thing[i] with itself, and we don't 
      # want that. 
      # * For every combination where j>i we would repeat all the 
      # comparisons for j<i and create duplicates. We don't want that. 
     scores[i, j] = (things[i].compare(things[j])) 

# I want the 5 most similar pairs: 
n = 5 
# This list will contain a tuple for each of the n most similar pairs: 
best_list = [] 
for k in xrange(n): 
    ij = np.argmax(scores) # Returns a single integer: ij = i*n + j 
    i = ij/N 
    j = ij % N 
    best_list.append((i, j)) 
    # Erease this score so that on next iteration the second largest score 
    # is found: 
    scores[i, j] = 0 

for k, (i, j) in enumerate(best_list): 
    # The number 1 most similar pair is the BEST match of all. 
    # The number N most similar pair is the WORST match of all. 
    print "The number %d most similar pair is thing number %d and %d." \ 
      % (k+1, i, j) 
    print "Thing%4d:" % i, \ 
      things[i].words, things[i].images, things[i].audios, things[i].videos 
    print "Thing%4d:" % j, \ 
      things[j].words, things[j].images, things[j].audios, things[j].videos 
    print 
+0

如果這個答案是你想到的,我可以修改它以找到最接近的5對物體。 – PaulMag 2014-10-10 13:43:45

+0

謝謝!一個很大的幫助。 – schoon 2014-10-13 09:15:41

+0

@schoon沒問題。這對你來說是否夠用了?還是我應該擴展它以完全回答第二個問題? – PaulMag 2014-10-13 13:24:30

1

如果您比較與工程「創建的所有特徵的總和,找到那些最接近的總和」,有一個簡單的一招親近對象:

  1. 把所有對象到一個數組
  2. 計算所有的款項
  3. 排序陣列由總和。

如果您採用任何索引,那麼靠近它的對象現在也會有一個關閉索引。因此,要查找5個最接近的對象,您只需要在排序後的數組中查看index+5index-5