在Postgresql中加入查詢的適當索引

用戶 id |名稱..

拉請求 ID | user_id | created_at | ...

我需要提取所有用戶加入他們的特定年份的拉請求數。所以我寫了一個查詢，像這樣：

SELECT users.*, COUNT(pull_requests.id) as pull_requests_count 
FROM "users" INNER JOIN 
    "pull_requests" 
    ON "pull_requests"."user_id" = "users"."id" 
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013) 
GROUP BY users.id

我最初對，

pull_requests.user_id（B樹）的索引。做解釋，我得到這個：

             QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------- 
HashAggregate (cost=18.93..18.96 rows=3 width=2775) 
    -> Hash Join (cost=14.13..18.92 rows=3 width=2775) 
     Hash Cond: (users.id = pull_requests.user_id) 
     -> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771) 
     -> Hash (cost=14.09..14.09 rows=3 width=8) 
       -> Bitmap Heap Scan on pull_requests (cost=4.28..14.09 rows=3 width=8) 
        Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision) 
        -> Bitmap Index Scan on pull_req_extract_year_created_at_ix (cost=0.00..4.28 rows=3 width=0) 
          Index Cond: (date_part('year'::text, created_at) = 2013::double precision)

然後我說像這樣的指標：

CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));

現在我的解釋是：

          QUERY PLAN 
-------------------------------------------------------------------------------------------- 
HashAggregate (cost=63.99..64.02 rows=3 width=2775) 
    -> Hash Join (cost=59.19..63.98 rows=3 width=2775) 
     Hash Cond: (users.id = pull_requests.user_id) 
     -> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771) 
     -> Hash (cost=59.16..59.16 rows=3 width=8) 
       -> Seq Scan on pull_requests (cost=0.00..59.16 rows=3 width=8) 
        Filter: (date_part('year'::text, created_at) = 2013::double precision)

不過我得到6.6毫秒100或者如此行。我如何進一步優化這個？

謝謝！

來源

2013-12-14 Steve Robinson

如果你想改善6.6毫秒的查詢，你必須考慮三次。真的：一旦數據庫變得越來越大，你的查詢就無法適應內存，時間可能會超過1ooo ms。 – wildplasser

其實我正在做一個限制200，這裏沒有顯示（忘記添加）。在這種情況下，這可以嗎？ –

'limit 200'還需要一個'order by'（除非你想從聚合的users.id s中隨機選擇200），這會導致外部hashjoin不可能，導致索引連接，嵌套循環或明確的排序步驟（這將炸燬您的查詢的足跡）。 LIMIT是一隻醜陋的野獸。 – wildplasser

嘗試這兩個指標組合成一個：

CREATE INDEX pr_ix ON pull_requests(EXTRACT(year FROM created_at), user_id);

，然後措辭作爲查詢：

SELECT users.*, pull_requests_count 
FROM "users" INNER JOIN 
    (select user_id, count(*) as pull_requests_count 
     from "pull_requests" 
     WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013) 
     group by user_id 
    ) pr 
    ON pr."user_id" = "users"."id";

該指數完全覆蓋的子查詢，所以將不再需要原來的表，只是索引掃描。然後可以將其重新連接回用戶。

來源

2013-12-14 23:42:52

將嘗試。謝謝 –

在Postgresql中加入查詢的適當索引

回答

相關問題