2014-01-13 80 views
2

我想優化下面的查詢。postgresql hashaggregate查詢優化

select cellid2 as cellid, max(endeks) as turkcell 
from (select a.cellid2 as cellid2, b.endeks 
    from (select geom, cellid as cellid2 from grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000) a join (select endeks, st_transform(geom, 2320) as geom_tmp from turkcell_data) b on st_intersects(a.geom, b.geom_tmp)) x 
group by cellid2 limit 5 

和解釋分析回報

"Limit (cost=81808.31..81808.36 rows=5 width=12) (actual time=271376.201..271376.204 rows=5 loops=1)" 
" -> HashAggregate (cost=81808.31..81879.63 rows=7132 width=12) (actual time=271376.200..271376.203 rows=5 loops=1)" 
"  -> Nested Loop (cost=0.00..81772.65 rows=7132 width=12) (actual time=5.128..269753.647 rows=1237707 loops=1)" 
"    Join Filter: _st_intersects(grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000.geom, st_transform(turkcell_data.geom, 2320))" 
"    -> Seq Scan on turkcell_data (cost=0.00..809.40 rows=3040 width=3045) (actual time=0.031..7.426 rows=3040 loops=1)" 
"    -> Index Scan using grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist on grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 (cost=0.00..24.76 rows=7 width=124) (actual time=0.012..0.799 rows=647 loops=3040)" 
"     Index Cond: (geom && st_transform(turkcell_data.geom, 2320))" 
"Total runtime: 271387.499 ms" 

存在着幾何列和小區id列的索引。我讀過,而不是使用max,desc命令並限制1更好。然而,由於我有分組的條款,我認爲這是行不通的。有沒有辦法做到這一點或其他方式來提高性能?

表定義:

CREATE TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 
(
    regionid numeric, 
    geom geometry(Geometry,2320), 
    cellid integer, 
    turkcell double precision 
) 
WITH (
    OIDS=FALSE 
); 
ALTER TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 
    OWNER TO postgres; 

-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid 

-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid; 

CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid 
    ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 
    USING btree 
    (cellid); 

-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist 

-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist; 

CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist 
    ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 
    USING gist 
    (geom); 

CREATE TABLE turkcell_data 
(
    gid serial NOT NULL, 
    objectid_1 integer, 
    objectid integer, 
    neighbourh numeric, 
    endeks numeric, 
    coorx numeric, 
    coory numeric, 
    shape_leng numeric, 
    shape_le_1 numeric, 
    shape_area numeric, 
    geom geometry(MultiPolygon,4326), 
    CONSTRAINT turkcell_data_pkey PRIMARY KEY (gid) 
) 
WITH (
    OIDS=FALSE 
); 
ALTER TABLE turkcell_data 
    OWNER TO postgres; 

-- Index: turkcell_data_geom_gist 

-- DROP INDEX turkcell_data_geom_gist; 

CREATE INDEX turkcell_data_geom_gist 
    ON turkcell_data 
    USING gist 
    (geom); 
+1

如果您希望我們幫助優化查詢,您需要向我們顯示錶格和索引定義**以及每個表格的行數。也許你的表格定義不好。也許索引沒有正確創建。也許你沒有一個你認爲你做過的那個專欄的索引。沒有看到表和索引定義,我們不能說。我們還需要行計數,因爲這會大大影響查詢優化。 –

+0

我已經添加了必要的定義。 – adaminasabi

+0

您的嵌套循環具有N = 7和N = 3040的子查詢,並導致N = 1237707行。這比汽車產品還要糟糕! – joop

回答

2

無論是存儲你的數據重新投影到2320,索引列,並在您的加入使用它,或者在幾何在turkcell_data變換投影創建索引。我通常更喜歡後者:

CREATE INDEX turkcell_data_geom_gist2320 
    ON turkcell_data 
    USING gist 
    (st_transform(geom, 2320)); 

另一個問題可能是,如果你的幾何形狀是非常複雜的 - 如果你的任何多邊形具有相對大量的你可能會卡住搗鼓遠的交點。不過,請先嚐試索引。

+0

我把這個索引放進去了,但並沒有改變那麼多。 – adaminasabi

+0

首先你可以「設置enable_seqscan = false」,然後運行併發布解釋分析(以驗證索引是否正確創建)?然後「設置enable_seqscan = true」,並再次運行併發布解釋分析。你還可以描述你的圖層嗎?它看起來像一層大約600個特徵,另一個大約3000個。是否有任何幾何體具有過多的點數?幾乎每一個第一層的幾何都與第二層幾乎每一個幾何相交? – yieldsfalsehood