2015-06-01 49 views
1

我想實現的代碼DBSCAN這裏:http://en.wikipedia.org/wiki/DBSCANDBSCAN返回部分集羣

我感到困惑的部分是

expandCluster(P, NeighborPts, C, eps, MinPts) add P to cluster C for each point P' in NeighborPts if P' is not visited mark P' as visited NeighborPts' = regionQuery(P', eps) if sizeof(NeighborPts') >= MinPts NeighborPts = NeighborPts joined with NeighborPts' if P' is not yet member of any cluster add P' to cluster C

我的代碼如下。因爲它現在返回的是部分集羣,即使它不在最近的eps附近,點應該被密度連接。我的代碼只返回每個點的前幾個鄰居。

import numpy 
import time 
from math import radians, cos, sin, asin, sqrt 
import re, math 


def haversine(lon1, lat1, lon2, lat2): 
    """ 
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees) returned as kilometers 
    """ 
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) 
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c 
    return km 



def ST_DBSCAN(points,max_distance,MinPts): 
    global visited 
    visited = [] 
    noise = [] 
    cluster_id = 0 
    clusters = [] 
    in_cluster = [] 
    for p in points: 
     if p not in visited: 
      # neighbor_points = [] 
      visited.append(p) 
      NeighborPts = regionQuery(p,points,max_distance) 
      if len(NeighborPts) < MinPts: 
       noise.append(p) 
      else: 
       cluster_id = cluster_id + 1 
       g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster) 
       clusters.append(g) 
    return clusters 

#return len(NeighborPts) 

def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster): 
    in_cluster.append(p[0]) 
    cluster = [] 
    cluster.append(p[0]) 
    for point in NeighborPts: 
     if point not in visited: 
      visited.append(point) 
      new_neighbors = regionQuery(point,points,max_distance) 
      if len(new_neighbors) >= MinPts: 
       new_neighbors.append(NeighborPts) 
      if point[0] not in in_cluster: 
       in_cluster.append(point[0]) 
       cluster.append(point[0])    
    return cluster 




def regionQuery(p,points,max_distance): 
    neighbor_points = [] 
    for j in points: 
     if j != p: 
      # print 'P is %s and j is %s' % (p[0],j[0]) 
      dist = haversine(p[1],p[2],j[1],j[2]) 
      if dist <= max_distance: 
       neighbor_points.append(j) 
    neighbor_points.append(p) 
    return neighbor_points 

我在下面有一個子集。點1和點5應該是開10.76公里因此它們不應該在初始查詢但它們應包括在相同的簇中,因爲點5密度經由點連接3.

pointList = [[1,36.4686,2.8289], 
[2,36.4706,2.8589], 
[3,36.4726,2.8889], 
[4,36.4746,2.9189], 
[5,36.4766,2.9489], 
[6,36.4786,2.9789], 
[7,36.4806,3.0089], 
[8,36.4826,3.0389], 
[9,36.4846,3.0689], 
[10,36.4866,3.0989]] 

points= pointList 

g = ST_DBSCAN(points,10,3) 
+0

這不回答你的問題,但如果你想要的是一個工作DBSCAN實現,scikit學習有着相當不錯的一個 – oxymor0n

+0

@ oxymor0n感謝您的評論。我試圖實現我自己的功能,以提高我對它的工作方式的理解,並在距離調用中給予一定的靈活性(最終,我想添加更多維度)。 – mech

+0

我認爲scikit版本不是很好*如果你想修改距離函數。它對於歐幾里得距離來說太優化了。 –

回答

1

expandCluster功能忘記新鄰居。

你集更新被交換。

+0

謝謝,我能夠通過切換由pop()方法關閉在new_neighbors每個點的集更新,並將其追加到NeighborPts修復錯誤。 – mech