如何索引點列表以加快搜索附近的點？

有關（x，y）點的列表，我試圖找到每個點的附近點。如何索引點列表以加快搜索附近的點？

from collections import defaultdict 
from math import sqrt 
from random import randint 

# Generate a list of random (x, y) points 
points = [(randint(0, 100), randint(0, 100)) for _ in range(1000)] 

def is_nearby(point_a, point_b, max_distance=5): 
    """Two points are nearby if their Euclidean distance is less than max_distance""" 
    distance = sqrt((point_b[0] - point_a[0])**2 + (point_b[1] - point_a[1])**2) 
    return distance < max_distance 

# For each point, find nearby points that are within a radius of 5 
nearby_points = defaultdict(list) 
for point in points: 
    for neighbour in points: 
     if point != neighbour: 
      if is_nearby(point, neighbour): 
       nearby_points[point].append(neighbour)

有沒有什麼辦法可以索引points使上述搜索更快？我覺得必須有一些比O更快的方式（len（points）** 2）。

編輯：一般點可浮動，不只是INTS

來源

2016-12-24 mchen

如果你的網格只有100 * 100，你可以在網格中排列你的點。這樣你可以大大減少搜索空間。 –

http://gis.stackexchange.com/questions/22082/how-can-i-use-r-tree-to-find-points-within-a-distance-in-spatialite –

這是一個固定的網格，每個網格點認爲是存在的樣本數量版本。

然後可以將搜索縮小到相關點周圍的空間。

from random import randint 
import math 

N = 100 
N_SAMPLES = 1000 

# create the grid 
grd = [[0 for _ in range(N)] for __ in range(N)] 

# set the number of points at a given gridpoint 
for _ in range(N_SAMPLES): 
    grd[randint(0, 99)][randint(0, 99)] += 1 

def find_neighbours(grid, point, distance): 

    # this will be: (x, y): number of points there 
    points = {} 

    for x in range(point[0]-distance, point[0]+distance): 
     if x < 0 or x > N-1: 
      continue 
     for y in range(point[1]-distance, point[1]+distance): 
      if y < 0 or y > N-1: 
       continue 
      dst = math.hypot(point[0]-x, point[1]-y) 
      if dst > distance: 
       continue 
      if grd[x][y] > 0: 
       points[(x, y)] = grd[x][y] 
    return points 

print(find_neighbours(grid=grd, point=(45, 36), distance=5)) 
# -> {(44, 37): 1, (45, 33): 1, ...} 
# meadning: there is one neighbour at (44, 37) etc...

進一步optimzation：用於x和y測試可以預先計算對於給定gridsize - 在math.hypot(point[0]-x, point[1]-y)就不必再爲完成每個點。

並且用numpy陣列替換網格可能是個好主意。

UPDATE

如果你的觀點是float是你還可以創建一個int電網以減少搜索空間：

from random import uniform 
from collections import defaultdict 
import math 

class Point: 
    def __init__(self, x, y): 
     self.x = x 
     self.y = y 

    @property 
    def x_int(self): 
     return int(self.x) 

    @property 
    def y_int(self): 
     return int(self.y) 

    def __str__(self): 
     fmt = '''{0.__class__.__name__}(x={0.x:5.2f}, y={0.y:5.2f})''' 
     return fmt.format(self) 

N = 100 
MIN = 0 
MAX = N-1 

N_SAMPLES = 1000 


# create the grid 
grd = [[[] for _ in range(N)] for __ in range(N)] 

# set the number of points at a given gridpoint 
for _ in range(N_SAMPLES): 
    p = Point(x=uniform(MIN, MAX), y=uniform(MIN, MAX)) 
    grd[p.x_int][p.y_int].append(p) 


def find_neighbours(grid, point, distance): 

    # this will be: (x_int, y_int): list of points 
    points = defaultdict(list) 

    # need to cast a slightly bigger net on the upper end of the range; 
    # int() rounds down 
    for x in range(point[0]-distance, point[0]+distance+1): 
     if x < 0 or x > N-1: 
      continue 
     for y in range(point[1]-distance, point[1]+distance+1): 
      if y < 0 or y > N-1: 
       continue 
      dst = math.hypot(point[0]-x, point[1]-y) 
      if dst > distance + 1: # account for rounding... is +1 enough? 
       continue 
      for pt in grd[x][y]: 
       if math.hypot(pt.x-x, pt.y-y) <= distance: 
        points[(x, y)].append(pt) 
    return points 

res = find_neighbours(grid=grd, point=(45, 36), distance=5) 

for int_point, points in res.items(): 
    print(int_point) 
    for point in points: 
     print(' ', point)

輸出看起來是這樣的：

(44, 36) 
    Point(x=44.03, y=36.93) 
(41, 36) 
    Point(x=41.91, y=36.55) 
    Point(x=41.73, y=36.53) 
    Point(x=41.56, y=36.88) 
...

爲了方便Points現在是一類。可能沒有必要，但...這取決於你如何密集或稀疏點

你也可以代表網格爲指向列表或Points字典...

也find_neighbours函數接受一個開始僅在該版本中由int組成的point。這也可能會被改進。

還有很大的改進空間：y軸的範圍可以使用三角法進行限制。而對於圈內的分數方式，則不需要單獨檢查;詳細的檢查只需要靠近圓圈的外緣完成。

來源

2016-12-24 13:08:30

謝謝 - 如果點是浮動而不是整數？這種方法只適用於我們將浮點數轉爲整數 – mchen

我認爲調整上述方法是可行的。在bisect_left（（點[0] +/-距離，點[1] +/-距離），點）之間搜索而不是在固定網格上搜索， – mchen

如何索引點列表以加快搜索附近的點？

回答

相關問題