2016-01-01 103 views
3

我有兩個陣列,經緯度和長度。我想計算每一對經緯度之間的距離,並與陣列中的每一對經緯度和長度之間的距離。 這是我的兩個數組。成對正半軸的距離計算

lat_array 

array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132, 
     0.33370132, 0.33370132, 0.33371075, 0.33371075, 0.33370132, 
     0.33370132, 0.33370132, 0.33356488, 0.33356488, 0.33370132, 
     0.33370132, 0.33370132, 0.33401788, 0.33362632, 0.33362632, 
     0.33364007, 0.33370132, 0.33401788, 0.33401788, 0.33358399, 
     0.33358399, 0.33358399, 0.33370132, 0.33370132, 0.33362632, 
     0.33370132, 0.33370132, 0.33370132, 0.33370132, 0.33370132, 
     0.33356488, 0.33356456, 0.33391071, 0.33370132, 0.33356488, 
     0.33356488, 0.33356456, 0.33356456, 0.33356456, 0.33362632, 
     0.33364804, 0.3336314 , 0.33370132, 0.33370132, 0.33370132, 
     0.33364034, 0.33359921, 0.33370132, 0.33360397, 0.33348863, 
     0.33370132]) 
long_array 

array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 , 
     1.2724337 , 1.2724337 , 1.27246931, 1.27246931, 1.2724337 , 
     1.2724337 , 1.2724337 , 1.27254305, 1.27254305, 1.2724337 , 
     1.2724337 , 1.2724337 , 1.27259085, 1.27250461, 1.27250461, 
     1.27251211, 1.2724337 , 1.27259085, 1.27259085, 1.27252134, 
     1.27252134, 1.27252134, 1.2724337 , 1.2724337 , 1.27250461, 
     1.2724337 , 1.2724337 , 1.2724337 , 1.2724337 , 1.2724337 , 
     1.27254305, 1.27253229, 1.27266808, 1.2724337 , 1.27254305, 
     1.27254305, 1.27253229, 1.27253229, 1.27253229, 1.27250461, 
     1.27250534, 1.27250184, 1.2724337 , 1.2724337 , 1.2724337 , 
     1.27251339, 1.27223739, 1.2724337 , 1.2722575 , 1.27237575, 
     1.2724337 ]) 

轉換成弧度後。現在我想要第一對lat和long之間的距離,剩餘的lat和long對,等等。並要打印對和相應的距離。

這就是我在做python的。

distance = [] 
R = 6371.0 

for i in range(len(lat_array)): 
    for j in (i+1,len(lat_array)): 
     dlon = long_array[j]-long_array[i] 
     dlat = lat_array[j]-lat_array[i] 
     a = sin(dlat/2)**2 + cos(lat_array[i]) * cos(lat_array[j]) *  
      sin(dlon/2)**2 
     c = 2 * atan2(sqrt(a), sqrt(1 - a)) 

     distance.append(R * c) 

它給了我一個錯誤IndexError: index 56 is out of bounds for axis 0 with size 56 我哪裏做錯了?如果數組很大,如何更快計算?請幫忙。

回答

4

您在代碼中存在拼寫錯誤。更改

for j in (i+1,len(lat_array)): 

for j in range(i+1,len(lat_array)): 

否則你遍歷包含兩個元素i+1len(lat_array)的元組。第二個導致錯誤。

+0

哦,我明白了..那是一個愚蠢的錯誤。但是,我如何用距離打印對子? – Neil

5

假設latlng爲lattitudes &經度陣列和那些弧度有數據,這裏是基於this other solution一個量化的解決方案 -

# Elementwise differentiations for lattitudes & longitudes 
dflat = lat[:,None] - lat 
dflng = lng[:,None] - lng 

# Finally Calculate haversine using its distance formula 
d = np.sin(dflat/2)**2 + np.cos(lat[:,None])*np.cos(lat) * np.sin(dflng/2)**2 
hav_dists = 2 * 6371 * np.arcsin(np.sqrt(d)) 

現在,上面的做法將會給我們輸出所有對不論他們的訂單。因此,我們將有兩對距離輸出:(point1,point2) & (point2,point1),即使距離是相同的。因此,爲了節省內存,並希望更好的性能,可以用np.triu_indices創建獨特配對ID和修改前面列出的方法,像這樣 -

# Elementwise differentiations for lattitudes & longitudes, 
# but not repeat for the same paired elements 
N = lat.size 
idx1,idx2 = np.triu_indices(N,1) 
dflat = lat[idx2] - lat[idx1] 
dflng = lng[idx2] - lng[idx1] 

# Finally Calculate haversine using its distance formula 
d = np.sin(dflat/2)**2 + np.cos(lat[idx2])*np.cos(lat[idx1]) * np.sin(dflng/2)**2 
hav_dists = 2 * 6371 * np.arcsin(np.sqrt(d)) 

功能定義 -

def original_app(lat,lng): 
    distance = [] 
    R = 6371.0 
    for i in range(len(lat)): 
     for j in range(i+1,len(lat)): 
      dlon = lng[j]-lng[i] 
      dlat = lat[j]-lat[i] 
      a = sin(dlat/2)**2 + cos(lat[i]) * cos(lat[j]) * sin(dlon/2)**2 
      c = 2 * atan2(sqrt(a), sqrt(1 - a)) 
      distance.append(R * c) 
    return distance 

def vectorized_app1(lat,lng):        
    dflat = lat[:,None] - lat 
    dflng = lng[:,None] - lng 
    d = np.sin(dflat/2)**2 + np.cos(lat[:,None])*np.cos(lat) * np.sin(dflng/2)**2 
    return 2 * 6371 * np.arcsin(np.sqrt(d)) 

def vectorized_app2(lat,lng):        
    N = lat.size 
    idx1,idx2 = np.triu_indices(N,1) 
    dflat = lat[idx2] - lat[idx1] 
    dflng = lng[idx2] - lng[idx1] 
    d =np.sin(dflat/2)**2+np.cos(lat[idx2])*np.cos(lat[idx1])*np.sin(dflng/2)**2 
    return 2 * 6371 * np.arcsin(np.sqrt(d)) 

驗證輸出 -

In [78]: lat 
Out[78]: array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132]) 

In [79]: lng 
Out[79]: array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 ]) 

In [80]: original_app(lat,lng) 
Out[80]: 
[0.2522702110418014, 
0.2522702110418014, 
2.909533226553249, 
1.0542204712876762, 
0.0, 
3.003834632906676, 
0.9897592295963831, 
3.003834632906676, 
0.9897592295963831, 
2.2276138997714474] 

In [81]: vectorized_app1(lat,lng) 
Out[81]: 
array([[ 0.  , 0.25227021, 0.25227021, 2.90953323, 1.05422047], 
     [ 0.25227021, 0.  , 0.  , 3.00383463, 0.98975923], 
     [ 0.25227021, 0.  , 0.  , 3.00383463, 0.98975923], 
     [ 2.90953323, 3.00383463, 3.00383463, 0.  , 2.2276139 ], 
     [ 1.05422047, 0.98975923, 0.98975923, 2.2276139 , 0.  ]]) 

In [82]: vectorized_app2(lat,lng) 
Out[82]: 
array([ 0.25227021, 0.25227021, 2.90953323, 1.05422047, 0.  , 
     3.00383463, 0.98975923, 3.00383463, 0.98975923, 2.2276139 ]) 

運行時測試 -

In [83]: lat = np.random.randn(1000) 

In [84]: lng = np.random.randn(1000) 

In [85]: %timeit original_app(lat,lng) 
1 loops, best of 3: 2.11 s per loop 

In [86]: %timeit vectorized_app1(lat,lng) 
1 loops, best of 3: 263 ms per loop 

In [87]: %timeit vectorized_app2(lat,lng) 
1 loops, best of 3: 224 ms per loop 

因此,對於性能來說,似乎vectorized_app2可能是要走的路!

+0

'lat [:,None]是什麼意思? – kfx

+0

@kfx這意味着我們將緯度從一維數組擴展到二維數組,通過添加一個單維(尺寸爲沿着該維的元素數爲1),這樣當用'lat'減去時,廣播就會發生,而我們將'lat'中的每個元素相對於其中的所有其他元素減去作爲2D數組。有關更多信息,請參閱廣播文檔 - http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html – Divakar

0

由於這是目前Google「成對正距離」的最高結果,因此我會添加我的兩分錢:如果您有權訪問scikit-learn,則可以快速解決此問題。在查看sklearn.metrics.pairwise_distances時,您會注意到'hasrsine'指標不受支持,但它在sklearn.neighbors.DistanceMetric中實施。

這意味着你可以做到以下幾點:

from sklearn.neighbors import DistanceMetric 

def sklearn_haversine(lat, lon): 
    haversine = DistanceMetric.get_metric('haversine') 
    latlon = np.hstack((lat[:, np.newaxis], lon[:, np.newaxis])) 
    dists = haversine.pairwise(latlon) 
    return 6371 * dists 

注意的latlon串聯,只需要,因爲它們是獨立的陣列。如果您將它們作爲形狀(n_samples, 2)的組合陣列傳遞,則可以直接在其上調用haversine.pairwise。此外,乘以6371也是唯一必要的,如果你需要以千米爲單位的距離。例如。如果您只想找到最接近的一對點,則此步驟不是必需的。

驗證:

In [87]: lat = np.array([ 0.33356456, 0.33355585, 0.33355585, 0.33401788, 0.33370132]) 

In [88]: lng = np.array([ 1.27253229, 1.27249141, 1.27249141, 1.27259085, 1.2724337 ]) 

In [89]: sklearn_haversine(lat, lng) 
Out[89]: 
array([[ 0.  , 0.25227021, 0.25227021, 2.90953323, 1.05422047], 
     [ 0.25227021, 0.  , 0.  , 3.00383463, 0.98975923], 
     [ 0.25227021, 0.  , 0.  , 3.00383463, 0.98975923], 
     [ 2.90953323, 3.00383463, 3.00383463, 0.  , 2.2276139 ], 
     [ 1.05422047, 0.98975923, 0.98975923, 2.2276139 , 0.  ]]) 

性能:

In [91]: lat = np.random.randn(1000) 

In [92]: lng = np.random.randn(1000) 

In [93]: %timeit original_app(lat,lng) 
1 loops, best of 3: 1.46 s per loop 

In [94]: %timeit vectorized_app1(lat,lng) 
10 loops, best of 3: 86.7 ms per loop 

In [95]: %timeit vectorized_app2(lat,lng) 
10 loops, best of 3: 75.7 ms per loop 

In [96]: %timeit sklearn_haversine(lat,lng) 
10 loops, best of 3: 76 ms per loop 

總之,你可以得到Divakar的vectorized_app1的與vectorized_app2在較短和簡單的代碼的速度輸出。