2014-06-30 146 views
1

我想從矩陣的行中計算矢量的最近的餘弦相鄰,並且已經測試了幾個Python函數的性能。高效的餘弦距離計算

def cos_loop_spatial(matrix, vector): 
    """ 
    Calculating pairwise cosine distance using a common for loop with the numpy cosine function. 
    """ 
    neighbors = [] 
    for row in range(matrix.shape[0]): 
     neighbors.append(scipy.spatial.distance.cosine(vector, matrix[row,:])) 
    return neighbors 

def cos_loop(matrix, vector): 
    """ 
    Calculating pairwise cosine distance using a common for loop with manually calculated cosine value. 
    """ 
    neighbors = [] 
    for row in range(matrix.shape[0]): 
     vector_norm = np.linalg.norm(vector) 
     row_norm = np.linalg.norm(matrix[row,:]) 
     cos_val = vector.dot(matrix[row,:])/(vector_norm * row_norm) 
     neighbors.append(cos_val) 
    return neighbors 

def cos_matrix_multiplication(matrix, vector): 
    """ 
    Calculating pairwise cosine distance using matrix vector multiplication. 
    """ 
    dotted = matrix.dot(vector) 
    matrix_norms = np.linalg.norm(matrix, axis=1) 
    vector_norm = np.linalg.norm(vector) 
    matrix_vector_norms = np.multiply(matrix_norms, vector_norm) 
    neighbors = np.divide(dotted, matrix_vector_norms) 
    return neighbors 

cos_functions = [cos_loop_spatial, cos_loop, cos_matrix_multiplication] 

# Test performance and plot the best results of each function 
mat = np.random.randn(1000,1000) 
vec = np.random.randn(1000) 
cos_performance = {} 
for func in cos_functions: 
    func_performance = %timeit -o func(mat, vec) 
    cos_performance[func.__name__] = func_performance.best 

pd.Series(cos_performance).plot(kind='bar') 

result

cos_matrix_multiplication功能顯然是最快的這些,但我想知道,如果你有進一步提高效率的矩陣向量餘弦距離計算的建議。

+0

由於您有工作代碼並要求改進它,因此您可能在Code Review上運氣更好。 – wnnmaw

+0

@wnnmaw啊,我會盡我所能,謝謝! –

回答

2

使用scipy.spatial.distance.cdist(mat, vec[np.newaxis,:], metric='cosine'),基本上計算兩個向量集合的每一對之間的成對距離,由兩個輸入矩陣的行表示。