1
我想從矩陣的行中計算矢量的最近的餘弦相鄰,並且已經測試了幾個Python函數的性能。高效的餘弦距離計算
def cos_loop_spatial(matrix, vector):
"""
Calculating pairwise cosine distance using a common for loop with the numpy cosine function.
"""
neighbors = []
for row in range(matrix.shape[0]):
neighbors.append(scipy.spatial.distance.cosine(vector, matrix[row,:]))
return neighbors
def cos_loop(matrix, vector):
"""
Calculating pairwise cosine distance using a common for loop with manually calculated cosine value.
"""
neighbors = []
for row in range(matrix.shape[0]):
vector_norm = np.linalg.norm(vector)
row_norm = np.linalg.norm(matrix[row,:])
cos_val = vector.dot(matrix[row,:])/(vector_norm * row_norm)
neighbors.append(cos_val)
return neighbors
def cos_matrix_multiplication(matrix, vector):
"""
Calculating pairwise cosine distance using matrix vector multiplication.
"""
dotted = matrix.dot(vector)
matrix_norms = np.linalg.norm(matrix, axis=1)
vector_norm = np.linalg.norm(vector)
matrix_vector_norms = np.multiply(matrix_norms, vector_norm)
neighbors = np.divide(dotted, matrix_vector_norms)
return neighbors
cos_functions = [cos_loop_spatial, cos_loop, cos_matrix_multiplication]
# Test performance and plot the best results of each function
mat = np.random.randn(1000,1000)
vec = np.random.randn(1000)
cos_performance = {}
for func in cos_functions:
func_performance = %timeit -o func(mat, vec)
cos_performance[func.__name__] = func_performance.best
pd.Series(cos_performance).plot(kind='bar')
的cos_matrix_multiplication
功能顯然是最快的這些,但我想知道,如果你有進一步提高效率的矩陣向量餘弦距離計算的建議。
由於您有工作代碼並要求改進它,因此您可能在Code Review上運氣更好。 – wnnmaw
@wnnmaw啊,我會盡我所能,謝謝! –