我已經創建了索引的36M行表上type
列:Postgres爲什麼不在簡單的GROUP BY上使用索引?
CREATE TABLE items AS
SELECT
(random()*36000000)::integer AS id,
(random()*10000)::integer AS type,
md5(random()::text) AS s
FROM
generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");
我運行這個簡單的查詢,並期望PostgreSQL一起使用我的索引:
explain select count(*) from "items" group by "type";
但查詢規劃決定使用序列掃描來代替:
HashAggregate (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
Group Key: type
-> Seq Scan on items (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms
時間不解釋道: 5S 979ms
- 運行
VACUUM ANALYZE
或VACUUM ANALYZE
- 配置
default_statistics_target
,random_page_cost
,work_mem
,但沒有從設定enable_seqscan = OFF
有助於分開:
SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";
GroupAggregate (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
Group Key: type
-> Index Only Scan using items_type_idx on items (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms
時間不解釋道: 659ms
查詢索引掃描是10倍左右我的機器上更快。
有沒有比設置enable_seqscan
更好的解決方案?
UPD1
我的PostgreSQL版本是9.6.3,work_mem = 4MB(試過64MB),random_page_cost = 4(試過1.1),max_parallel_workers_per_gather = 0(試過4)。
UPD2
我試圖填補型列不是隨機數,但i/10000
使pg_stats.correlation
= 1 - 仍然seqscan。
UPD3
@jgh是100%正確的:
當表的行寬比一些指標
我做了大更廣這通常只發生列data
,現在postgres使用索引。感謝大家!
什麼是你的PostgreSQL的版本?另外,請提供'EXPLAIN ANALYZE'的輸出。 –
http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=0c5c410657513d1bda7f2e21a4d36eb9 - 只比較'enable_seqscan = ON'和'enable_seqscan = OFF'的實際時間安排 – Abelisto
您對work_mem和random_page_cost的設置是什麼? [和:爲什麼表沒有主鍵?] – wildplasser