1
我有兩個表:在Postgresql中加入查詢的適當索引
用戶 id |名稱..
拉請求 ID | user_id | created_at | ...
我需要提取所有用戶加入他們的特定年份的拉請求數。所以我寫了一個查詢,像這樣:
SELECT users.*, COUNT(pull_requests.id) as pull_requests_count
FROM "users" INNER JOIN
"pull_requests"
ON "pull_requests"."user_id" = "users"."id"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
GROUP BY users.id
我最初對,
pull_requests.user_id(B樹)的索引。 做解釋,我得到這個:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=18.93..18.96 rows=3 width=2775)
-> Hash Join (cost=14.13..18.92 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=14.09..14.09 rows=3 width=8)
-> Bitmap Heap Scan on pull_requests (cost=4.28..14.09 rows=3 width=8)
Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision)
-> Bitmap Index Scan on pull_req_extract_year_created_at_ix (cost=0.00..4.28 rows=3 width=0)
Index Cond: (date_part('year'::text, created_at) = 2013::double precision)
然後我說像這樣的指標:
CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));
現在我的解釋是:
QUERY PLAN
--------------------------------------------------------------------------------------------
HashAggregate (cost=63.99..64.02 rows=3 width=2775)
-> Hash Join (cost=59.19..63.98 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=59.16..59.16 rows=3 width=8)
-> Seq Scan on pull_requests (cost=0.00..59.16 rows=3 width=8)
Filter: (date_part('year'::text, created_at) = 2013::double precision)
不過我得到6.6毫秒100或者如此行。我如何進一步優化這個?
謝謝!
如果你想改善6.6毫秒的查詢,你必須考慮三次。真的:一旦數據庫變得越來越大,你的查詢就無法適應內存,時間可能會超過1ooo ms。 – wildplasser
其實我正在做一個限制200,這裏沒有顯示(忘記添加)。在這種情況下,這可以嗎? –
'limit 200'還需要一個'order by'(除非你想從聚合的users.id s中隨機選擇200),這會導致外部hashjoin不可能,導致索引連接,嵌套循環或明確的排序步驟(這將炸燬您的查詢的足跡)。 LIMIT是一隻醜陋的野獸。 – wildplasser