2015-06-30 36 views
2

我想在兩個表之間運行空間查詢。表一(prism_ppt_monthly - 詳見下文)是月降水量數據。表二(usgs_basin_boundary--詳見下文)是水文盆地邊界的多邊形。優化PostGIS查詢,ST_Intersects

我想創建每個盆地總降水量的時間序列。我有一個查詢會做到這一點(詳情見下文),但是對於單個計算,它需要將近4.75秒。考慮到我有1440個月的降水數據和近40個盆地,這個查詢將花費:4.75秒* 1440 * 40 = 77小時。

以下是查詢和表格的信息。我在每個表上都有空間索引(要點),並且每個表都有VACUUM ANALYZED。任何想法,我可能會加快這件事情將不勝感激!

QUERY:

EXPLAIN ANALYZE 
SELECT filename,date_from,date_to,site_no,sqmi,(ST_SummaryStats(rast)).* FROM prism_ppt_monthly, usgs_basin_boundary WHERE ST_Intersects(rast,ST_Transform(geom,4269)) LIMIT 1; 
                      QUERY PLAN                   
    ------------------------------------------------------------------------------------------------------------------------------------------------------------ 
    Limit (cost=65964.53..66000.10 rows=1 width=81) (actual time=4764.969..4764.972 rows=1 loops=1) 
     -> Nested Loop (cost=65964.53..66782.60 rows=23 width=81) (actual time=4764.963..4764.963 rows=1 loops=1) 
      Join Filter: _st_intersects(st_transform(usgs_basin_boundary.geom, 4269), prism_ppt_monthly.rast, NULL::integer) 
      -> Hash Semi Join (cost=65964.53..66610.73 rows=47 width=126256) (actual time=4587.961..4587.961 rows=1 loops=1) 
        Hash Cond: ((usgs_basin_boundary.site_no)::text = df_flow.code) 
        -> Seq Scan on usgs_basin_boundary (cost=0.00..639.09 rows=2509 width=126256) (actual time=0.007..1.279 rows=595 loops=1) 
        -> Hash (cost=65963.94..65963.94 rows=47 width=9) (actual time=4585.313..4585.313 rows=47 loops=1) 
         Buckets: 1024 Batches: 1 Memory Usage: 2kB 
         -> HashAggregate (cost=65963.00..65963.47 rows=47 width=9) (actual time=4585.126..4585.215 rows=47 loops=1) 
           -> Seq Scan on df_flow (cost=0.00..63244.20 rows=1087520 width=9) (actual time=5.826..2367.593 rows=1087520 loops=1) 
      -> Index Scan using prism_ppt_monthly_rast_gist on prism_ppt_monthly (cost=0.00..0.40 rows=1 width=64) (actual time=0.034..0.034 rows=1 loops=1) 
        Index Cond: ((rast)::geometry && st_transform(usgs_basin_boundary.geom, 4269)) 
    Total runtime: 4765.151 ms 

表1:

\d+ prism_ppt_monthly 
              Table "public.prism_ppt_monthly" 
     Column | Type |       Modifiers       | Storage | Description 
    -----------+---------+-----------------------------------------------------------------+----------+------------- 
    rid  | integer | not null default nextval('prism_ppt_monthly_rid_seq'::regclass) | plain | 
    rast  | raster |                 | extended | 
    filename | text |                 | extended | 
    date_from | date |                 | plain | 
    date_to | date |                 | plain | 
    Indexes: 
     "prism_ppt_monthly_pkey" PRIMARY KEY, btree (rid) 
     "prism_ppt_monthly_rast_gist" gist (st_convexhull(rast)) 
    Check constraints: 
     "enforce_height_rast" CHECK (st_height(rast) = 621) 
     "enforce_max_extent_rast" CHECK (st_coveredby(st_convexhull(rast), '0103000020AD10000001000000050000005555555555415FC01E01000000F8484060A9AAAAAA9E50C01E01000000F8484060A9AAAAAA9E50C0F5FFFFFFFF0F38405555555555415FC0F5FFFFFFFF0F38405555555555415FC01E01000000F84840'::geometry)) 
     "enforce_nodata_values_rast" CHECK (_raster_constraint_nodata_values(rast)::numeric(16,10)[] = '{-9999}'::numeric(16,10)[]) 
     "enforce_num_bands_rast" CHECK (st_numbands(rast) = 1) 
     "enforce_out_db_rast" CHECK (_raster_constraint_out_db(rast) = '{f}'::boolean[]) 
     "enforce_pixel_types_rast" CHECK (_raster_constraint_pixel_types(rast) = '{32BF}'::text[]) 
     "enforce_same_alignment_rast" CHECK (st_samealignment(rast, '0100000000365755555555A53F365755555555A5BF5555555555415FC01E01000000F8484000000000000000000000000000000000AD10000001000100'::raster)) 
     "enforce_scalex_rast" CHECK (st_scalex(rast)::numeric(16,10) = 0.04166666666667::numeric(16,10)) 
     "enforce_scaley_rast" CHECK (st_scaley(rast)::numeric(16,10) = (-0.04166666666667)::numeric(16,10)) 
     "enforce_srid_rast" CHECK (st_srid(rast) = 4269) 
     "enforce_width_rast" CHECK (st_width(rast) = 1405) 
    Has OIDs: no 

表2:

\d+ usgs_basin_boundary 
               Table "public.usgs_basin_boundary" 
    Column |   Type    |        Modifiers        | Storage | Description 
----------+-----------------------------+-------------------------------------------------------------------+----------+------------- 
gid  | integer      | not null default nextval('usgs_basin_boundary_gid_seq'::regclass) | plain | 
site_no | character varying(15)  |                 | extended | 
sqmi  | numeric      |                 | main  | 
abs_diff | numeric      |                 | main  | 
geom  | geometry(MultiPolygon,5070) |                 | main  | 
Indexes: 
    "usgs_basin_boundary_pkey" PRIMARY KEY, btree (gid) 
    "usgs_basin_boundary_shape_gist" gist (geom) 
Has OIDs: no 
+0

您確定解釋計劃和查詢匹配嗎?我看不到'df_flow'來自哪裏。 –

回答

1

並沒有使用usgs_basin_boundary.geom索引,因爲你打電話ST_Transform(geom,4269)您應該創建一個變換結果索引(as mentioned in the manual

CREATE INDEX jkb_usgs_basin_boundary_geom_t_4269 
    ON usgs_basin_boundary 
    USING gist 
    (ST_Transform(geom,4269)) 
0

它不會解決所有的問題,但我只是在一些在查詢跌跌撞撞:

SELECT (...) WHERE ST_Intersects(rast,ST_Transform(geom,4269)) LIMIT 1; 

的重要組成部分,這是一個:

ST_Transform(geom,4269) 

你突出在查詢中間的幾何圖形。雖然這當然是可能的,但這可能不是好的做法。您可以預先將'geom'投影到'SRID:4269'到另一個表格中。之後,您只需訪問其他表格中的變換幾何圖形。

在將geom轉換爲新表之後,您可能還需要在該表上創建一個索引。這可能會提高性能。