2016-01-02 62 views
0

我正在處理一個具有緯度和長度數據的數據框,我需要將彼此距離最近的點(200米)聚集在一起。這就是我在Python中所做的。python中的經度和緯度聚類

order_lat order_long 
0 19.111841 72.910729 
1 19.111342 72.908387 
2 19.111342 72.908387 
3 19.137815 72.914085 
4 19.119677 72.905081 
5 19.119677 72.905081 
6 19.119677 72.905081 
7 19.120217 72.907121 
8 19.120217 72.907121 
9 19.119677 72.905081 
10 19.119677 72.905081 
11 19.119677 72.905081 
12 19.111860 72.911346 
13 19.111860 72.911346 
14 19.119677 72.905081 
15 19.119677 72.905081 
16 19.119677 72.905081 
17 19.137815 72.914085 
18 19.115380 72.909144 
19 19.115380 72.909144 
20 19.116168 72.909573 
21 19.119677 72.905081 
22 19.137815 72.914085 
23 19.137815 72.914085 
24 19.112955 72.910102 
25 19.112955 72.910102 
26 19.112955 72.910102 
27 19.119677 72.905081 
28 19.119677 72.905081 
29 19.115380 72.909144 
30 19.119677 72.905081 
31 19.119677 72.905081 
32 19.119677 72.905081 
33 19.119677 72.905081 
34 19.119677 72.905081 
35 19.111860 72.911346 
36 19.111841 72.910729 
37 19.131674 72.918510 
38 19.119677 72.905081 
39 19.111860 72.911346 
40 19.111860 72.911346 
41 19.111841 72.910729 
42 19.111841 72.910729 
43 19.111841 72.910729 
44 19.115380 72.909144 
45 19.116625 72.909185 
46 19.115671 72.908985 
47 19.119677 72.905081 
48 19.119677 72.905081 
49 19.119677 72.905081 
50 19.116183 72.909646 
51 19.113827 72.893833 
52 19.119677 72.905081 
53 19.114100 72.894985 
54 19.107491 72.901760 
55 19.119677 72.905081 

然後我發現每對lat和長與每個其它對LAT和長在數據幀之間的距離。

lat_array = np.radians(np.array(order_data['order_lat'])) 
long_array = np.radians(np.array(order_data['order_long'])) 

distance = [] 
pairs_lat1 = [] 
pairs_long1 = [] 
pairs_lat2 = [] 
pairs_long2 = [] 
for i in range(len(lat_array)): 
    for j in range(i+1,len(lat_array)): 
     dlon = long_array[j]-long_array[i] 
     dlat = lat_array[j]-lat_array[i] 
     a = np.sin(dlat/2)**2 + np.cos(lat_array[i]) * np.cos(lat_array[j]) 
      * np.sin(dlon/2)**2 
     c = 2 * 6371 * np.arcsin(np.sqrt(a)) 
     pairs_lat1.append(lat_array[i]) 
     pairs_long1.append(long_array[i]) 
     pairs_lat2.append(lat_array[j]) 
     pairs_long2.append(long_array[j]) 
     distance.append(c) 

df_distance = pd.DataFrame() 
df_distance['lat1'] = np.rad2deg(pairs_lat1) 
df_distance['long1'] = np.rad2deg(pairs_long1) 
df_distance['lat2'] = np.rad2deg(pairs_lat2) 
df_distance['long2'] = np.rad2deg(pairs_long2)  
df_distance['distance'] = distance 


df_distance.head() 

     lat1  long1  lat2  long2  distance 
0  19.111841 72.910729 19.111342 72.908387 2.522482e-01 
1  19.111841 72.910729 19.111342 72.908387 2.522482e-01 
2  19.111841 72.910729 19.137815 72.914085 2.909520e+00 
3  19.111841 72.910729 19.119677 72.905081 1.054209e+00 
4  19.111841 72.910729 19.119677 72.905081 1.054209e+00 
5  19.111841 72.910729 19.119677 72.905081 1.054209e+00 

,給了我一對之間的距離(LAT1,long1 & LAT2,long2)252米 我怎樣才能聚集點?所以最近的點在一起。可以說在250米範圍內。 我可以在我的情況下使用層次聚類嗎?

+0

非常相似:http://stackoverflow.com/questions/24617013/convert-latitude-and-longitude-to-x-and-y-grid-system-using-python – jbg

回答

1

最簡單的方法是建立一個包含任意兩點之間距離的距離矩陣,然後使用任何經典的聚類算法。 Scikit-learn是最流行的聚類庫(其他許多事情)之一。 您也可以嘗試GVM,這是專門爲地理空間聚類設計的。