2011-07-27 81 views
1

這個問題讓我頭疼一陣子。我有一個PostgreSQL 8.4數據庫,它只包含一個包含超過4.000.000條記錄的表。此表的結構如下:確定箱子之間的距離

CREATE TABLE metadata (
    id serial NOT NULL, 
    "value" text NOT NULL DEFAULT ''::text, 
    segment_box box NOT NULL DEFAULT box(point(((-9223372036854775808)::bigint)::double precision, (1)::double precision), point((9223372036854775807::bigint)::double precision, ((-1))::double precision)), 
    CONSTRAINT metadata_pk PRIMARY KEY (id) 
) 

CREATE INDEX metadata_segment_box_ix 
    ON metadata 
    USING gist 
    (segment_box); 

CREATE INDEX metadata_tag_value_ix 
    ON metadata 
    USING btree 
    (value); 

該表包含以矩形框表示的段(時間),表示爲矩形框。這些段使用「值」列註釋。

我想在數據庫上執行的查詢類型嘗試查找包含在特定窗口中的指定值的所有段。成功地實現了這樣的查詢是:

SELECT * FROM (SELECT * FROM metadata WHERE value='X') a, 
(SELECT * FROM metadata WHERE AND value='Y') b 
WHERE a.segment_box <-> b.segment_box <= 3000 

但是,正如你可能已經注意到,這個查詢不能有效地通過數據庫中執行。子查詢a和b的笛卡爾積變得非常大。有沒有辦法更有效地執行這些查詢?我可以想象某種滑動窗口方法可以做到這一點。也許類似如下:

SELECT *, rank() OVER (
PARTITION BY "value" ORDER BY (segment_box[1])[0], (segment_box[0])[0] 
) FROM metadata WHERE value='X' OR value='Y' 

更新: 一個我張貼這個問題是建立在Postgres的自定義功能後嘗試的事情。我想:

CREATE OR REPLACE FUNCTION within_window(size bigint DEFAULT 0) 
    RETURNS setof metadata AS 
$BODY$DECLARE 
    segment RECORD; 
    neighbour RECORD; 
    newwindow box; 
BEGIN 
    FOR segment IN (
    SELECT * FROM metadata WHERE value='X' OR value='Y' 
     ORDER BY (segment_box[1])[0], (segment_box[0])[0] 
) LOOP 
    newwindow := box(segment.segment_box[0], 
     point((((segment.segment_box[1])[0]) + size), (segment.segment_box[1])[1])); 
    FOR neighbour IN (
     SELECT DISTINCT ON (metadata_id) * FROM metadata WHERE value='X' OR value='Y') 
     AND segment_box &< newwindow 
     AND segment_box &> newwindow 
    ) LOOP 
     RETURN NEXT neighbour; 
    END LOOP; 
    END LOOP; 
END;$BODY$ 
    LANGUAGE plpgsql; 

然而,這種功能是作爲我如上所述,因爲必須執行多次的子查詢鹼性溶液緩慢。任何其他想法呢?

+1

在您的問題中,您有'postgis'作爲標記,但您在此處未使用PostGIS。如果你這樣做了,你將擁有像[ST_DWithin](http://postgis.org/documentation/manual-svn/ST_DWithin.html)這樣的好緩衝區功能來幫助你。 –

+0

您認爲使用ST_DWithin函數將使查詢與答案中顯示的函數相比更快嗎? – joost1024

回答

2

我用一種掃描線算法自己解決了這個問題。只執行一個查詢。我使用遊標來回掃描查詢的結果集。得到的算法的工作原理如下:

CREATE OR REPLACE FUNCTION within_window(size bigint DEFAULT 0) 
    RETURNS setof metadata AS 
$BODY$DECLARE 
crsr SCROLL CURSOR FOR (SELECT * FROM metadata WHERE value='X' OR value='Y' ORDER BY (segment_box[1])[0], (segment_box[0])[0]); 
rc RECORD; 
rcc RECORD; 
crsr_position int; 
last_crsr int; 
BEGIN 
    OPEN crsr; 
    crsr_position := 0; 
    LOOP FETCH NEXT FROM crsr INTO rc; 
     IF NOT FOUND THEN 
      EXIT; 
     END IF; 
     last_crsr := crsr_position; 
     LOOP FETCH NEXT FROM crsr INTO rcc; 
      IF NOT FOUND THEN 
       EXIT; 
      ELSEIF 
       rcc.segment_box &< box(rc.segment_box[0], point((((rc.segment_box[1])[0]) + size), (rc.segment_box[1])[1])) AND 
       rcc.segment_box &> box(rc.segment_box[0], point((((rc.segment_box[1])[0]) + size), (rc.segment_box[1])[1])) 
      THEN 
       RETURN NEXT rcc; 
      ELSE 
       EXIT; 
      END IF; 
     END LOOP; 
     crsr_position := last_crsr + 1; 
     MOVE ABSOLUTE crsr_position FROM crsr; 
    END LOOP; 
    CLOSE crsr; 
END;$BODY$ 
    LANGUAGE plpgsql; 

使用此功能的查詢只需要476毫秒,而不是6+分鐘(4+萬行的數據庫)!