2013-12-14 28 views
1

我有兩個表:在Postgresql中加入查詢的適當索引

用戶 id |名稱..

拉請求 ID | user_id | created_at | ...

我需要提取所有用戶加入他們的特定年份的拉請求數。所以我寫了一個查詢,像這樣:

SELECT users.*, COUNT(pull_requests.id) as pull_requests_count 
FROM "users" INNER JOIN 
    "pull_requests" 
    ON "pull_requests"."user_id" = "users"."id" 
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013) 
GROUP BY users.id 

我最初對,

pull_requests.user_id(B樹)的索引。 做解釋,我得到這個:

             QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------- 
HashAggregate (cost=18.93..18.96 rows=3 width=2775) 
    -> Hash Join (cost=14.13..18.92 rows=3 width=2775) 
     Hash Cond: (users.id = pull_requests.user_id) 
     -> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771) 
     -> Hash (cost=14.09..14.09 rows=3 width=8) 
       -> Bitmap Heap Scan on pull_requests (cost=4.28..14.09 rows=3 width=8) 
        Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision) 
        -> Bitmap Index Scan on pull_req_extract_year_created_at_ix (cost=0.00..4.28 rows=3 width=0) 
          Index Cond: (date_part('year'::text, created_at) = 2013::double precision) 

然後我說像這樣的指標:

CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at)); 

現在我的解釋是:

          QUERY PLAN 
-------------------------------------------------------------------------------------------- 
HashAggregate (cost=63.99..64.02 rows=3 width=2775) 
    -> Hash Join (cost=59.19..63.98 rows=3 width=2775) 
     Hash Cond: (users.id = pull_requests.user_id) 
     -> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771) 
     -> Hash (cost=59.16..59.16 rows=3 width=8) 
       -> Seq Scan on pull_requests (cost=0.00..59.16 rows=3 width=8) 
        Filter: (date_part('year'::text, created_at) = 2013::double precision) 

不過我得到6.6毫秒100或者如此行。我如何進一步優化這個?

謝謝!

+0

如果你想改善6.6毫秒的查詢,你必須考慮三次。真的:一旦數據庫變得越來越大,你的查詢就無法適應內存,時間可能會超過1ooo ms。 – wildplasser

+0

其實我正在做一個限制200,這裏沒有顯示(忘記添加)。在這種情況下,這可以嗎? –

+1

'limit 200'還需要一個'order by'(除非你想從聚合的users.id s中隨機選擇200),這會導致外部hashjoin不可能,導致索引連接,嵌套循環或明確的排序步驟(這將炸燬您的查詢的足跡)。 LIMIT是一隻醜陋的野獸。 – wildplasser

回答

1

嘗試這兩個指標組合成一個:

CREATE INDEX pr_ix ON pull_requests(EXTRACT(year FROM created_at), user_id); 

,然後措辭作爲查詢:

SELECT users.*, pull_requests_count 
FROM "users" INNER JOIN 
    (select user_id, count(*) as pull_requests_count 
     from "pull_requests" 
     WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013) 
     group by user_id 
    ) pr 
    ON pr."user_id" = "users"."id"; 

該指數完全覆蓋的子查詢,所以將不再需要原來的表,只是索引掃描。然後可以將其重新連接回用戶。

+0

將嘗試。謝謝 –