2014-02-25 67 views
0

我有兩個numpy數組,如下所示。計算兩個numpy數組的歐氏距離

X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087]) 
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108]) 

這些是10個用戶的x和y協調。我需要找到每個用戶之間的相似性。 對於如:

x1 = -0.34095692 
y1 = 0.16305762 
x2 = -0.34044722 
y2 = 0.38554548 

Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2 

所以,最後我希望得到一個矩陣像以下內容:幫助我實現這一目標。

enter image description here

+1

聽起來不錯。什麼是問題? –

+0

@Jonathon Reinhart:我不知道要開始嗎?任何幫助? –

+1

嘆了口氣,你有沒有考慮問[Google](http://www.google.com/search?q=numpy+euclidean+distance)?它直接導致你[這個成功回答的問題](http://stackoverflow.com/questions/1401712/calculate-euclidean-distance-with-numpy)。 –

回答

2

使用zip(X, Y)得到座標對,如果你想獲得點之間的歐氏距離,它應該是(|x1-x2|^2+|y1-y2|^2)^0.5,不(|x1-y1|^2 - |x2-y2|^2)^1/2

In [125]: coords=zip(X, Y) 

In [126]: from scipy import spatial 
    ...: dists=spatial.distance.cdist(coords, coords) 

In [127]: dists 
Out[127]: 
array([[ 0.  , 0.22248844, 0.09104884, 0.75377329, 0.10685954, 
     0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785], 
     [ 0.22248844, 0.  , 0.28973034, 0.9737061 , 0.23197262, 
     0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719], 
     [ 0.09104884, 0.28973034, 0.  , 0.68642072, 0.19047682, 
     0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553], 
     [ 0.75377329, 0.9737061 , 0.68642072, 0.  , 0.79415038, 
     0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561], 
     [ 0.10685954, 0.23197262, 0.19047682, 0.79415038, 0.  , 
     0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196], 
     [ 0.41534165, 0.62852005, 0.33880688, 0.35411306, 0.47665258, 
     0.  , 0.15477091, 0.56683251, 0.24003205, 0.25201351], 
     [ 0.5109039 , 0.73270705, 0.45038919, 0.24770988, 0.54665574, 
     0.15477091, 0.  , 0.65808357, 0.36700881, 0.09751671], 
     [ 0.15149362, 0.09751671, 0.23539542, 0.90290761, 0.13560014, 
     0.56683251, 0.65808357, 0.  , 0.34181257, 0.73270705], 
     [ 0.19490308, 0.39258852, 0.1064197 , 0.59283795, 0.28381556, 
     0.24003205, 0.36700881, 0.34181257, 0.  , 0.45902146], 
     [ 0.58971785, 0.81219719, 0.53629553, 0.20443561, 0.61376196, 
     0.25201351, 0.09751671, 0.73270705, 0.45902146, 0.  ]]) 

要獲得此陣列的上三角,請使用numpy.triu

In [128]: np.triu(dists) 
Out[128]: 
array([[ 0.  , 0.22248844, 0.09104884, 0.75377329, 0.10685954, 
     0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785], 
     [ 0.  , 0.  , 0.28973034, 0.9737061 , 0.23197262, 
     0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719], 
     [ 0.  , 0.  , 0.  , 0.68642072, 0.19047682, 
     0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553], 
     [ 0.  , 0.  , 0.  , 0.  , 0.79415038, 
     0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.15477091, 0.56683251, 0.24003205, 0.25201351], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.65808357, 0.36700881, 0.09751671], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.  , 0.34181257, 0.73270705], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.  , 0.  , 0.45902146], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  , 
     0.  , 0.  , 0.  , 0.  , 0.  ]]) 
+0

非常感謝!終於找到了。再次感謝。 :) –

+0

@NilaniAlgiriyage樂於幫忙,np;) – zhangxaochen

2

一小段,沒有工作:

A = (X-Y)**2 
p, q = np.meshgrid(np.arange(10), np.arange(10)) 
np.sqrt(A[p]-A[q]) 

編輯:說明

  1. A僅僅是一個預計算的矢量與所有平方差。
  2. 神奇的是np.meshgrid:這個函數的目的是在兩個不同的數組中生成所有的值對。這不是最好的解決方案,因爲你會得到整個矩陣,但對於你擁有的樣本數量來說並不是什麼大不了的。生成的值將對應於A的索引。
  3. 指數化部分A[p]也是一種魔法。試着自己去了解它的行爲。
  4. 這裏矩陣充滿了nan但這就是你要求的。真正的歐幾里德距離是+,而不是-

p &問:

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) 

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 
    [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], 
    [3, 3, 3, 3, 3, 3, 3, 3, 3, 3], 
    [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], 
    [5, 5, 5, 5, 5, 5, 5, 5, 5, 5], 
    [6, 6, 6, 6, 6, 6, 6, 6, 6, 6], 
    [7, 7, 7, 7, 7, 7, 7, 7, 7, 7], 
    [8, 8, 8, 8, 8, 8, 8, 8, 8, 8], 
    [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]]) 
+0

這很好!我沒有檢查過這個的準確性。你能解釋一下嗎?任何方式都有很多nans的權利? –

+0

非常感謝您的詳細解答。是的,這應該是+我現在已經更新了這個問題。最後一個問題,我沒有得到,這些'nans'是什麼意思?(它們更接近或更分離或什麼?) –

+0

的差異可能是負面的,'sqrt'會使負數爲'nan'。用正確的公式,你不會得到這些'nan's – Kiwi