歐幾里得範數使用numexpr

我需要使用numexpr重寫此代碼，它計算矩陣數據[行x列]和向量[1 x列]的歐幾里得範數矩陣。歐幾里得範數使用numexpr

d = ((data-vec)**2).sum(axis=1)

該怎麼辦？也許還有另一種更快的方法？

我使用hdf5和數據矩陣來源於它的問題。例如，此代碼給出錯誤：對象未對齊。

#naive numpy solution, can be parallel? 
def test_bruteforce_knn(): 
    h5f = tables.open_file(fileName) 

    t0= time.time() 
    d = np.empty((rows*batches,)) 
    for i in range(batches): 
     d[i*rows:(i+1)*rows] = ((h5f.root.carray[i*rows:(i+1)*rows]-vec)**2).sum(axis=1) 
    print (time.time()-t0) 
    ndx = d.argsort() 
    print ndx[:k] 

    h5f.close() 

#using some tricks (don't work error: objects are not aligned) 
def test_bruteforce_knn(): 
    h5f = tables.open_file(fileName) 

    t0= time.time() 
    d = np.empty((rows*batches,)) 
    for i in range(batches): 
     d[i*rows:(i+1)*rows] = (np.einsum('ij,ij->i', h5f.root.carray[i*rows:(i+1)*rows], 
     h5f.root.carray[i*rows:(i+1)*rows]) 
     + np.dot(vec, vec) 
     -2 * np.dot(h5f.root.carray[i*rows:(i+1)*rows], vec)) 
    print (time.time()-t0) 
    ndx = d.argsort() 
    print ndx[:k] 

    h5f.close()

使用numexpr：似乎numexpr不明白h5f.root.carray [我*行：第（i + 1）*行]必須重新分配？

import numexpr as ne 

def test_bruteforce_knn(): 
    h5f = tables.open_file(fileName) 

    t0= time.time() 
    d = np.empty((rows*batches,)) 
    for i in range(batches): 
     ne.evaluate("sum((h5f.root.carray[i*rows:(i+1)*rows] - vec) ** 2, axis=1)") 
    print (time.time()-t0) 
    ndx = d.argsort() 
    print ndx[:k] 

    h5f.close()

來源

2013-10-29 mrgloom

有隻使用NumPy的，這是在使用一個潛在的快速方法（對於非常大的陣列）scikit學習：

def squared_row_norms(X): 
    # From http://stackoverflow.com/q/19094441/166749 
    return np.einsum('ij,ij->i', X, X) 

def squared_euclidean_distances(data, vec): 
    data2 = squared_row_norms(data) 
    vec2 = squared_row_norms(vec) 
    d = np.dot(data, vec.T).ravel() 
    d *= -2 
    d += data2 
    d += vec2 
    return d

這是基於這樣的事實，（X - Y）² = x 2 + y 2 - 2xy，即使對於矢量。

測試：

>>> data = np.random.randn(10, 40) 
>>> vec = np.random.randn(1, 40) 
>>> ((data - vec) ** 2).sum(axis=1) 
array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154, 
     84.7 , 82.28910021, 67.48309433, 81.94813371, 
     64.68162331, 77.43265692]) 
>>> squared_euclidean_distances(data, vec) 
array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154, 
     84.7 , 82.28910021, 67.48309433, 81.94813371, 
     64.68162331, 77.43265692]) 
>>> from sklearn.metrics.pairwise import euclidean_distances 
>>> euclidean_distances(data, vec, squared=True).ravel() 
array([ 96.75712686, 69.45894306, 100.71998244, 80.97797154, 
     84.7 , 82.28910021, 67.48309433, 81.94813371, 
     64.68162331, 77.43265692])

簡介：

>>> data = np.random.randn(1000, 40) 
>>> vec = np.random.randn(1, 40) 
>>> %timeit ((data - vec)**2).sum(axis=1) 
10000 loops, best of 3: 114 us per loop 
>>> %timeit squared_euclidean_distances(data, vec) 
10000 loops, best of 3: 52.5 us per loop

使用numexpr也是可能的，但它似乎並沒有給予任何加速的1000點（在10000，這是不更好）：

>>> %timeit ne.evaluate("sum((data - vec) ** 2, axis=1)") 
10000 loops, best of 3: 142 us per loop

來源

2013-10-29 09:41:50

謝謝，squared_euclidean_distances肯定工作更快，也看看我的更新，不能管理numexpr版本到工作。 – mrgloom

歐幾里得範數使用numexpr

回答

相關問題