聚類計算的有效距離

我想計算從一組N個三維點到一組三維M中心的距離並將結果存儲在一個NxM矩陣中（其中第i列是從所有點到中心的距離ⅰ）聚類計算的有效距離

實施例：

data = np.random.rand(100,3) # 100 toy 3D points 
centers = np.random.rand(20,3) # 20 toy 3D points

爲了計算所有點，我們可以使用一個單一的中心之間的距離「廣播」，所以我們避免發生循環雖然所有點：

i = 0  # first center 
np.sqrt(np.sum(np.power(data - centers[i,:], 2),1)) # Euclidean distance

現在，我們可以把這個代碼在一個循環，遍歷所有中心：

distances = np.zeros(data.shape[0], centers.shape[0]) 
for i in range(centers.shape[0]): 
    distances[:,i] = np.sqrt(np.sum(np.power(data - centers[i,:], 2),1))

然而，這顯然是可以並行的和改進的操作。

我想知道是否有更好的方法做到這一點（也許一些多維廣播或一些圖書館）。

這是一個非常常見的聚類和分類問題，你想從數據到一組類的距離，所以我認爲這應該是一個有效的實現。

這樣做的最好方法是什麼？

來源

2017-05-05 Sembei Norimaki

有關此主題的選項很多http://stackoverflow.com/questions/43367001/how-to-calculate-euclidean-distance-between-pair-of-rows-of-a-numpy-array/43368088#43368088 – NaN

你知道嗎scikit-learn：http://scikit-learn.org/？你會發現很多分類方法 – Dadep

更具體，你可能想要使用paiwise距離函數（http://stackoverflow.com/a/43367358/5786475）或instanciate k-means方法（http：// scikit-learn.org/stable/modules/clustering.html#k-means）與您的中心並請求距離。 – pixelou

廣播一路：

import numpy as np 
data = np.random.rand(100,3) 
centers = np.random.rand(20,3) 
distances = np.sqrt(np.sum(np.power(data[:,None,:] - centers[None,:,:], 2), axis=-1)) 
print distances.shape 
# 100, 20

如果你只是想最近的中心，你有很多的數據點（很多是比數100個樣本以上），你應該存儲你數據存儲在KD樹中，並用中心查詢（scipy.spatial.KDTree）。

來源

2017-05-05 13:17:40 Paul

聚類計算的有效距離

回答

相關問題