2014-12-24 19 views
1

嘗試下表複合GIN/GIST指數UUID,時間戳和幾何

5 CREATE TABLE t (
    6  uuid4 UUID PRIMARY KEY 
    7  , arr TEXT[] 
10  , geom GEOMETRY 
11  , ts TIMESTAMP WITHOUT TIME ZONE 
12); 
13 CREATE INDEX ON t USING GIST (geom); 

,看起來像

explain analyze 
SELECT kmeans 
, count(*)::int 
, ST_X(ST_Centroid(ST_Collect(geom))) AS lon 
, ST_Y(ST_Centroid(ST_Collect(geom))) AS lat 
, STRING_TO_ARRAY(STRING_AGG(ARRAY_TO_STRING(arr, ','), ','), ',') AS arr 
FROM (
    SELECT kmeans(ARRAY[ST_X(geom), ST_Y(geom)], 25) OVER(), geom, arr 
    FROM t 
    WHERE ts > NOW() - '12 hours'::interval 
    AND geom IS NOT NULL 
    AND uuid4 != '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid 
    AND arr @> (SELECT arr FROM t WHERE uuid4 = '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid LIMIT 1) 
    AND ST_Distance_Sphere(ST_MakePoint(-77, 38), geom) < 10000 
) AS ksub 
GROUP BY kmeans 
ORDER BY kmeans; 

在一定距離內從本質上找到的所有行對優化查詢,在某個時間範圍內有geom填充,並且arr包含指定arr中的所有項目。使用kmeans-postgresql聚合函數將找到的這些行集羣。我目前看到

GroupAggregate (cost=347.69..349.59 rows=38 width=98) (actual time=50.034..50.384 rows=25 loops=1) 
    -> Sort (cost=347.69..347.78 rows=38 width=98) (actual time=49.994..49.999 rows=99 loops=1) 
     Sort Key: (kmeans(ARRAY[st_x(t.geom), st_y(t.geom)], 25) OVER (?)) 
     Sort Method: quicksort Memory: 42kB 
     -> WindowAgg (cost=25.18..346.31 rows=38 width=94) (actual time=49.955..49.968 rows=99 loops=1) 
       InitPlan 1 (returns $0) 
       -> Limit (cost=0.29..8.30 rows=1 width=62) (actual time=0.018..0.018 rows=1 loops=1) 
         -> Index Scan using t_uuid4_ts_idx on t t_1 (cost=0.29..8.30 rows=1 width=62) (actual time=0.017..0.017 rows=1 loops=1) 
          Index Cond: (uuid4 = '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid) 
       -> Bitmap Heap Scan on t (cost=16.88..337.34 rows=38 width=94) (actual time=13.363..49.747 rows=99 loops=1) 
        Recheck Cond: (arr @> $0) 
        Filter: ((geom IS NOT NULL) AND (uuid4 <> '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid) AND (ts > (now() - '12:00:00'::interval)) AND (_st_distance('010100 
0020E610000000000000004053C00000000000004340'::geography, geography(geom), 0::double precision, false) < 10000::double precision)) 
        Rows Removed by Filter: 22989 
        -> Bitmap Index Scan on t_arr_idx (cost=0.00..16.87 rows=115 width=0) (actual time=13.072..13.072 rows=23089 loops=1) 
          Index Cond: (arr @> $0) 
Total runtime: 50.464 ms 

它會似乎是位圖堆+位圖索引將是一個最佳的索引解決方案,但我一直在想,如果有,以避免額外的過濾和複查的方式。我可以通過構建替代索引來提高性能嗎?我已經嘗試過:

Indexes: 
    "t_pkey" PRIMARY KEY, btree (uuid4) 
    "t_geom_idx" gist (geom) 
    "t_geom_ts_idx" gist (geom, ts) 
    "t_geom_ts_uuid4_idx" gist (geom, ts, (uuid4::text)) 
    "t_iam_idx" gin (arr) 
    "t_ts_geom_idx" gist (ts, geom) 
    "t_ts_geom_uuid4_idx" gist (ts, geom, (uuid4::text)) 
    "t_ts_uuid4_geom_idx" gist (ts, (uuid4::text), geom) 
    "t_uuid4_ts_idx" btree (uuid4, ts) 

注意k均值爲https://github.com/umitanuki/kmeans-postgresql的延伸。

+2

尼斯查詢。您是否嘗試過使用ST_DWithin而不是ST_Distance_Sphere?它可能會更好地利用空間索引,而不是實際計算所有這些距離。 –

+0

這是票。將您的建議的結果發佈爲答案。謝謝! – Justin

回答

1

根據JohnBarça的建議,我使用了ST_DWithin,在我的幾何圖形和時間戳上使用了GIST索引,並將上面發佈的相同查詢的運行時間縮短到了不到10ms。唯一棘手的部分意識到我需要度數而不是米來計算幾何(地理位置可以使用米)。 This問題指出我有足夠精確的解決方案:

AND ST_DWithin(ST_MakePoint(-77.0710820577842, 37.9940763922052), geom, 10000/(111.31 * 1000 * COS(ST_Y(ST_MakePoint(-77.0710820577842, 37.9940763922052)) * Pi()/180))