2012-06-15 98 views
4

我有2D的三維ndarray座標,例如:NumPy的:執行函數在每個ndarray元件

[[[1704 1240] 
    [1745 1244] 
    [1972 1290] 
    [2129 1395] 
    [1989 1332]] 

[[1712 1246] 
    [1750 1246] 
    [1964 1286] 
    [2138 1399] 
    [1989 1333]] 

[[1721 1249] 
    [1756 1249] 
    [1955 1283] 
    [2145 1399] 
    [1990 1333]]] 

的最終目標是去除最接近點到一個給定的點([1989 1332])從5個座標的每個「組」。我的想法是生成一個類似形狀的距離數組,然後使用argmin來確定要刪除的值的索引。然而,我不確定如何去應用一個函數,比如一個函數來計算到一個給定點的距離,至少用一個NumPythonic的方式來表示每個元素。

回答

4

列表理解是非常低效處理numpy數組的方法。對於距離計算來說,它們是一個特別糟糕的選擇。

要找到你的數據和一個點之間的區別,你只需要做data - point。然後,您可以使用np.hypot來計算距離,或者如果您願意,可以將其平方,將其相加,並取平方根。

如果您爲了計算目的而將其設置爲Nx2數組,那麼它會更容易一些。

基本上,你想是這樣的:

import numpy as np 

data = np.array([[[1704, 1240], 
        [1745, 1244], 
        [1972, 1290], 
        [2129, 1395], 
        [1989, 1332]], 

       [[1712, 1246], 
        [1750, 1246], 
        [1964, 1286], 
        [2138, 1399], 
        [1989, 1333]], 

       [[1721, 1249], 
        [1756, 1249], 
        [1955, 1283], 
        [2145, 1399], 
        [1990, 1333]]]) 

point = [1989, 1332] 

#-- Calculate distance ------------ 
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier 
dist = data.reshape((-1,2)) - point 
dist = np.hypot(*dist.T) 

# We can then reshape it back to AxBx1 array, similar to the original shape 
dist = dist.reshape(data.shape[0], data.shape[1], 1) 
print dist 

這產生了:

array([[[ 299.48121811], 
     [ 259.38388539], 
     [ 45.31004304], 
     [ 153.5219854 ], 
     [ 0.  ]], 

     [[ 290.04310025], 
     [ 254.0019685 ], 
     [ 52.35456045], 
     [ 163.37074401], 
     [ 1.  ]], 

     [[ 280.55837182], 
     [ 247.34186868], 
     [ 59.6405902 ], 
     [ 169.77926846], 
     [ 1.41421356]]]) 

現在,除去最近的元素不是簡單地讓最接近的元素更難一點。

隨着numpy,你可以使用布爾索引相當容易地做到這一點。

但是,您需要擔心一些關於軸的對齊方式。

關鍵是要了解沿着軸的numpy「廣播」操作的最後的軸。在這種情況下,我們想沿着中軸進行播客。

此外,-1可以用作軸的大小的佔位符。當-1作爲軸的大小放置時,Numpy將計算允許的大小。

什麼我們需要做的看起來有點像這樣:

#-- Remove closest point --------------------- 
mask = np.squeeze(dist) != dist.min(axis=1) 
filtered = data[mask] 

# Once again, let's reshape things back to the original shape... 
filtered = filtered.reshape(data.shape[0], -1, data.shape[2]) 

你可以做一個單一的線,我只是將它分解爲可讀性。關鍵是dist != something會生成一個布爾數組,然後您可以使用它來索引原始數組。

所以,全部放在一起:

import numpy as np 

data = np.array([[[1704, 1240], 
        [1745, 1244], 
        [1972, 1290], 
        [2129, 1395], 
        [1989, 1332]], 

       [[1712, 1246], 
        [1750, 1246], 
        [1964, 1286], 
        [2138, 1399], 
        [1989, 1333]], 

       [[1721, 1249], 
        [1756, 1249], 
        [1955, 1283], 
        [2145, 1399], 
        [1990, 1333]]]) 

point = [1989, 1332] 

#-- Calculate distance ------------ 
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier 
dist = data.reshape((-1,2)) - point 
dist = np.hypot(*dist.T) 

# We can then reshape it back to AxBx1 array, similar to the original shape 
dist = dist.reshape(data.shape[0], data.shape[1], 1) 

#-- Remove closest point --------------------- 
mask = np.squeeze(dist) != dist.min(axis=1) 
filtered = data[mask] 

# Once again, let's reshape things back to the original shape... 
filtered = filtered.reshape(data.shape[0], -1, data.shape[2]) 

print filtered 

產量:

array([[[1704, 1240], 
     [1745, 1244], 
     [1972, 1290], 
     [2129, 1395]], 

     [[1712, 1246], 
     [1750, 1246], 
     [1964, 1286], 
     [2138, 1399]], 

     [[1721, 1249], 
     [1756, 1249], 
     [1955, 1283], 
     [2145, 1399]]]) 

在一個側面說明,如果一個以上的點是同樣接近,這是不行的。 Numpy數組必須沿着每個維度具有相同數量的元素,因此在這種情況下您需要重新進行分組。

+0

啊,不知何故,我沒有看到這之前,我張貼。我想過使用'apply_along_axis',但我測試了它,速度更快。 – senderle

+0

'apply_along_axis'應該使用更少的內存,所以這兩種方法仍然有用! –

+0

謝謝!非常簡潔,但內容豐富。太快了。 – OneTrickyPony

0

有多種方法可以做到這一點,但這裏是一個應用列表解析:

距離函數:

In [35]: from numpy.linalg import norm 

In [36]: dist = lambda x,y:norm(x-y) 

輸入數據:

In [39]: GivenMatrix = scipy.rand(3, 5, 2) 

In [40]: GivenMatrix 
Out[40]: 
array([[[ 0.83798666, 0.90294439], 
     [ 0.8706959 , 0.88397176], 
     [ 0.91879085, 0.93512921], 
     [ 0.15989245, 0.57311869], 
     [ 0.82896003, 0.53589968]], 

     [[ 0.0207089 , 0.9521768 ], 
     [ 0.94523963, 0.31079109], 
     [ 0.41929482, 0.88559614], 
     [ 0.87885236, 0.45227422], 
     [ 0.58365369, 0.62095507]], 

     [[ 0.14757177, 0.86101539], 
     [ 0.58081214, 0.12632764], 
     [ 0.89958321, 0.73660852], 
     [ 0.3408943 , 0.45420989], 
     [ 0.42656333, 0.42770216]]]) 

In [41]: q = scipy.rand(2) 

In [42]: q 
Out[42]: array([ 0.03280889, 0.71057403]) 

計算輸出的距離:

In [44]: distances = [[dist(x, q) for x in SubMatrix] 
         for SubMatrix in GivenMatrix] 

In [45]: distances 
Out[45]: 
[[0.82783910695733931, 
    0.85564093542511577, 
    0.91399620574915652, 
    0.18720096539588818, 
    0.81508758596405939], 
[0.24190557184498068, 
    0.99617079746515047, 
    0.42426891258164884, 
    0.88459501973012633, 
    0.55808740166908177], 
[0.18921712490174292, 
    0.80103146210692744, 
    0.86716521557255788, 
    0.40079819635686459, 
    0.48482888965287363]] 

對結果進行排序的每個子矩陣:

In [46]: scipy.argsort(distances) 
Out[46]: 
array([[3, 4, 0, 1, 2], 
     [0, 2, 4, 3, 1], 
     [0, 3, 4, 1, 2]]) 

至於刪除,我個人認爲這是最簡單的通過轉換GivenMatrixlist,然後使用del

>>> GivenList = GivenMatrix.tolist() 

>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix 
1

如果我正確理解你的問題,我認爲你正在尋找apply_along_axis

>>> a - numpy.array([1989, 1332]) 
array([[[-285, -92], 
     [-244, -88], 
     [ -17, -42], 
     [ 140, 63], 
     [ 0, 0]], 

     [[-277, -86], 
     [-239, -86], 
     [ -25, -46], 
     [ 149, 67], 
     [ 0, 1]], 

     [[-268, -83], 
     [-233, -83], 
     [ -34, -49], 
     [ 156, 67], 
     [ 1, 1]]]) 

然後,我們可以申請numpy.linalg.norm它:

>>> dist = a - numpy.array([1989, 1332]) 
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist) 
array([[ 299.48121811, 259.38388539, 45.31004304, 
     153.5219854 , 0.  ], 
     [ 290.04310025, 254.0019685 , 52.35456045, 
     163.37074401, 1.  ], 
     [ 280.55837182, 247.34186868, 59.6405902 , 
     169.77926846, 1.41421356]]) 

最後,一些布爾面具掛羊頭賣狗肉,隨着使用內置的廣播numpy的,我們可以簡單地從陣列減去點一對reshape電話:

>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2)) 
array([[[1704, 1240], 
     [1745, 1244], 
     [1972, 1290], 
     [2129, 1395]], 

     [[1712, 1246], 
     [1750, 1246], 
     [1964, 1286], 
     [2138, 1399]], 

     [[1721, 1249], 
     [1756, 1249], 
     [1955, 1283], 
     [2145, 1399]]]) 

喬金頓的答案雖然更快。好吧。我會留下這個後代。

def joes(data, point): 
    dist = data.reshape((-1,2)) - point 
    dist = np.hypot(*dist.T) 
    dist = dist.reshape(data.shape[0], data.shape[1], 1) 
    mask = np.squeeze(dist) != dist.min(axis=1) 
    return data[mask].reshape((3, 4, 2)) 

def mine(a, point): 
    dist = a - point 
    normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist) 
    return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2)) 

>>> %timeit mine(data, point) 
1000 loops, best of 3: 586 us per loop 
>>> %timeit joes(data, point) 
10000 loops, best of 3: 48.9 us per loop