2012-10-10 36 views
0

我有兩個不同長度的大型向量(〜133000個值)。它們分別從小到大排序。我想找到在給定容差範圍內相似的值。這是我的解決方案,但速度很慢。有沒有辦法加快這一點?比較python中的大型向量

import numpy as np 

for lv in range(np.size(vector1)): 
    for lv_2 in range(np.size(vector2)): 
     if np.abs(vector1[lv_2]-vector2[lv])<.02: 
      print(vector1[lv_2],vector2[lv],lv,lv_2) 
      break 

回答

0

你的算法遠不是最優的。你比較太多的價值觀。假設您位於vector1的某個位置,並且vector2的當前值已經大於0.02更大。你爲什麼要比較其餘的vector2

開始像

pos1 = 0 
pos2 = 0 

現在,在您的載體的志願服務崗位比較值。如果差異太大,請移動較小的那個位置並再次檢查。繼續,直到你到達一個向量的末尾。

0

還沒有測試過,但下面的工作。我們的想法是利用該載體進行排序

lv_1, lv_2 = 0,0 
while lv_1 < len(vector1) and lv_2 < len(vector2): 
    if np.abs(vector1[lv_2]-vector2[lv_1])<.02: 
    print(vector1[lv_2],vector2[lv_1],lv_1,lv_2) 
    lv_1 += 1 
    lv_2 += 1 
    elif vector1[lv_1] < vector2[lv_2]: lv_1 += 1 
    else: lv_2 += 1 
0

下面的代碼提供的性能很好的增加,這取決於數字如何密集是事實。使用一組1000個隨機數字,統一在0到100之間採樣,它的運行速度比實施快30倍。

pos_1_start = 0 

for i in range(np.size(vector1)): 
    for j in range(pos1_start, np.size(vector2)): 
     if np.abs(vector1[i] - vector2[j]) < .02: 
      results1 += [(vector1[i], vector2[j], i, j)] 
     else: 
      if vector2[j] < vector1[i]: 
       pos1_start += 1 
      else: 
       break 

時機:

time new method: 0.112464904785 
time old method: 3.59720897675 

這是由下面的腳本製作:

import random 
import numpy as np 
import time 

# initialize the vectors to be compared 
vector1 = [random.uniform(0, 40) for i in range(1000)] 
vector2 = [random.uniform(0, 40) for i in range(1000)] 

vector1.sort() 
vector2.sort() 

# the arrays that will contain the results for the first method 
results1 = [] 

# the arrays that will contain the results for the second method 
results2 = [] 

pos1_start = 0 

t_start = time.time() 
for i in range(np.size(vector1)): 
    for j in range(pos1_start, np.size(vector2)): 
     if np.abs(vector1[i] - vector2[j]) < .02: 
      results1 += [(vector1[i], vector2[j], i, j)] 
     else: 
      if vector2[j] < vector1[i]: 
       pos1_start += 1 
      else: 
       break 

t1 = time.time() - t_start 
print "time new method:", t1 

t = time.time() 
for lv1 in range(np.size(vector1)): 
    for lv2 in range(np.size(vector2)): 
     if np.abs(vector1[lv1]-vector2[lv2])<.02: 
      results2 += [(vector1[lv1], vector2[lv2], lv1, lv2)] 
t2 = time.time() - t_start 

print "time old method:", t2 
# sort the results 

results1.sort() 
results2.sort() 

print np.allclose(results1, results2) 
+0

謝謝這幫助了很多! – user1734149