2010-05-25 31 views
0

使用這些氣象站實際上有數據的日期從城市給定半徑內的氣象站提取氣候數據的查詢(請參閱下文)。該查詢使用表的唯一指標,而有效:隨機頁面成本和規劃

CREATE UNIQUE INDEX measurement_001_stc_idx 
    ON climate.measurement_001 
    USING btree 
    (station_id, taken, category_id); 

從2.0到1.1降低爲random_page_cost服務器的配置價值已經在給定範圍內大規模的性能提升(近一個數量級),因爲它提出到PostgreSQL它應該使用索引。雖然結果現在在5秒內(從大約85秒降低)返回,但仍存在問題的線條。通過一個單一的一年碰撞查詢的結束日期將導致全表掃描:

sc.taken_start >= '1900-01-01'::date AND 
sc.taken_end <= '1997-12-31'::date AND 

如何說服PostgreSQL的兩個日期之間使用的指標,無論歲月? (對4300萬行進行全表掃描可能不是最好的計劃。)在查詢下方找到EXPLAIN ANALYZE結果。

謝謝!

查詢

SELECT 
    extract(YEAR FROM m.taken) AS year, 
    avg(m.amount) AS amount 
    FROM 
    climate.city c, 
    climate.station s, 
    climate.station_category sc, 
    climate.measurement m 
    WHERE 
    c.id = 5182 AND 
    earth_distance(
     ll_to_earth(c.latitude_decimal,c.longitude_decimal), 
     ll_to_earth(s.latitude_decimal,s.longitude_decimal))/1000 <= 30 AND 
    s.elevation BETWEEN 0 AND 3000 AND 
    s.applicable = TRUE AND 
    sc.station_id = s.id AND 
    sc.category_id = 1 AND 
    sc.taken_start >= '1900-01-01'::date AND 
    sc.taken_end <= '1996-12-31'::date AND 
    m.station_id = s.id AND 
    m.taken BETWEEN sc.taken_start AND sc.taken_end AND 
    m.category_id = sc.category_id 
    GROUP BY 
    extract(YEAR FROM m.taken) 
    ORDER BY 
    extract(YEAR FROM m.taken) 

1900至1996年:指數

"Sort (cost=1348597.71..1348598.21 rows=200 width=12) (actual time=2268.929..2268.935 rows=92 loops=1)" 
" Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))" 
" Sort Method: quicksort Memory: 32kB" 
" -> HashAggregate (cost=1348586.56..1348590.06 rows=200 width=12) (actual time=2268.829..2268.886 rows=92 loops=1)" 
"  -> Nested Loop (cost=0.00..1344864.01 rows=744510 width=12) (actual time=0.807..2084.206 rows=134893 loops=1)" 
"    Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (sc.station_id = m.station_id))" 
"    -> Nested Loop (cost=0.00..12755.07 rows=1220 width=18) (actual time=0.502..521.937 rows=23 loops=1)" 
"     Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube))/1000::double precision) <= 30::double precision)" 
"     -> Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1)" 
"       Index Cond: (id = 5182)" 
"     -> Nested Loop (cost=0.00..9907.73 rows=3659 width=34) (actual time=0.014..28.937 rows=3458 loops=1)" 
"       -> Seq Scan on station_category sc (cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458 loops=1)" 
"        Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1996-12-31'::date) AND (category_id = 1))" 
"       -> Index Scan using station_pkey1 on station s (cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=3458)" 
"        Index Cond: (s.id = sc.station_id)" 
"        Filter: (s.applicable AND (s.elevation >= 0) AND (s.elevation <= 3000))" 
"    -> Append (cost=0.00..1072.27 rows=947 width=18) (actual time=6.996..63.199 rows=5865 loops=23)" 
"     -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.000..0.000 rows=0 loops=23)" 
"       Filter: (m.category_id = 1)" 
"     -> Bitmap Heap Scan on measurement_001 m (cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865 loops=23)" 
"       Recheck Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))" 
"       -> Bitmap Index Scan on measurement_001_stc_idx (cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865 loops=23)" 
"        Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))" 
"Total runtime: 2269.264 ms" 

1900年至1997年:全表掃描

"Sort (cost=1370192.26..1370192.76 rows=200 width=12) (actual time=86165.797..86165.809 rows=94 loops=1)" 
" Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))" 
" Sort Method: quicksort Memory: 32kB" 
" -> HashAggregate (cost=1370181.12..1370184.62 rows=200 width=12) (actual time=86165.654..86165.736 rows=94 loops=1)" 
"  -> Hash Join (cost=4293.60..1366355.81 rows=765061 width=12) (actual time=534.786..85920.007 rows=139721 loops=1)" 
"    Hash Cond: (m.station_id = sc.station_id)" 
"    Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end))" 
"    -> Append (cost=0.00..867005.80 rows=43670150 width=18) (actual time=0.009..79202.329 rows=43670079 loops=1)" 
"     -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.001..0.001 rows=0 loops=1)" 
"       Filter: (category_id = 1)" 
"     -> Seq Scan on measurement_001 m (cost=0.00..866980.80 rows=43670144 width=18) (actual time=0.008..73312.008 rows=43670079 loops=1)" 
"       Filter: (category_id = 1)" 
"    -> Hash (cost=4277.93..4277.93 rows=1253 width=18) (actual time=534.704..534.704 rows=25 loops=1)" 
"     -> Nested Loop (cost=847.87..4277.93 rows=1253 width=18) (actual time=415.837..534.682 rows=25 loops=1)" 
"       Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube))/1000::double precision) <= 30::double precision)" 
"       -> Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.012..0.014 rows=1 loops=1)" 
"        Index Cond: (id = 5182)" 
"       -> Hash Join (cost=847.87..1352.07 rows=3760 width=34) (actual time=6.427..35.107 rows=3552 loops=1)" 
"        Hash Cond: (s.id = sc.station_id)" 
"        -> Seq Scan on station s (cost=0.00..367.25 rows=7948 width=20) (actual time=0.004..23.529 rows=7949 loops=1)" 
"          Filter: (applicable AND (elevation >= 0) AND (elevation <= 3000))" 
"        -> Hash (cost=800.87..800.87 rows=3760 width=14) (actual time=6.416..6.416 rows=3552 loops=1)" 
"          -> Bitmap Heap Scan on station_category sc (cost=430.29..800.87 rows=3760 width=14) (actual time=2.316..5.353 rows=3552 loops=1)" 
"           Recheck Cond: (category_id = 1)" 
"           Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1997-12-31'::date))" 
"           -> Bitmap Index Scan on station_category_station_category_idx (cost=0.00..429.35 rows=6376 width=0) (actual time=2.268..2.268 rows=6339 loops=1)" 
"             Index Cond: (category_id = 1)" 
"Total runtime: 86165.936 ms" 

回答

1

問題是電臺ID並未順序分佈在測量表中。解決辦法:

CREATE UNIQUE INDEX measurement_001_stc_index 
    ON climate.measurement_001 
    USING btree 
    (station_id, taken, category_id); 
ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index; 

通過強制在列CLUSTER,該站的ID在物理上與表的自然順序排列的磁盤。這使性能提高了一個數量級。

2

它看起來像Postgres高估了有多少站在一個城市5182附近。它認爲有1220,但只有23個。

你可以兩個查詢強制讓站首先像這樣(未測試,可能需要tweeking ):

start transaction; 
create temporary table s(id int); 
insert into s 
    select id from 
    climate.city c, 
    climate.station s 
    where 
    c.id = 5182 AND 
    earth_distance(
     ll_to_earth(c.latitude_decimal,c.longitude_decimal), 
     ll_to_earth(s.latitude_decimal,s.longitude_decimal))/1000 <= 30 AND 
    s.elevation BETWEEN 0 AND 3000 AND 
    s.applicable = TRUE; 
analyze s; 

SELECT 
    extract(YEAR FROM m.taken) AS year, 
    avg(m.amount) AS amount 
    FROM 
    climate.station_category sc, 
    climate.measurement m, 
    s 
    WHERE 
    sc.category_id = 1 AND 
    sc.taken_start >= '1900-01-01'::date AND 
    sc.taken_end <= '1996-12-31'::date AND 
    m.station_id = sc.station_id AND 
    m.taken BETWEEN sc.taken_start AND sc.taken_end AND 
    m.category_id = sc.category_id AND 
    sc.station_id = s.id 
    GROUP BY 
    extract(YEAR FROM m.taken) 
    ORDER BY 
    extract(YEAR FROM m.taken); 
rollback; 

您還可以set enable_seqscan=off此查詢。這將迫使Postgres不惜一切代價避免順序掃描。

+0

第二個想法我已經用2個查詢重寫了一個查詢。這樣Postgres不能高估站點。請嘗試這個,並告訴它是否更好。 – Tometzky 2010-05-28 06:27:34

+0

工作站收集在一個子選擇中。子選擇被調整以找到半徑的邊界矩形。這允許PostgreSQL使用索引消除站點,因此需要僅檢查落入最小邊界矩形內的站點以查看它們是否在給定半徑內。性能的大幅提升來自於使用'CLUSTER'ed索引將物理模型與邏輯模型對齊。 – 2010-06-26 20:16:22