2016-10-28 63 views
0

我有一個執行多個連接的查詢。我嘗試只獲取結果中最新的每個關鍵字的位置。postgres兩列排序低性能

下面是該查詢:

與成本
SELECT DISTINCT ON (p.keyword_id) 
    a.id  AS account_id, 
    w.parent_id AS parent_id, 
    w.name  AS name, 
    p.position AS position 
FROM websites w 
    JOIN accounts a ON w.account_id = a.id 
    JOIN keywords k ON k.website_id = w.parent_id 
    JOIN positions p ON p.website_id = w.parent_id 
WHERE a.amount > 0 AND w.parent_id NOTNULL AND (round((a.amount/a.payment_renewal_period), 2) BETWEEN 1 AND 19) 
ORDER BY p.keyword_id, p.created_at DESC; 

計劃查詢是如下:

Unique (cost=73673.65..76630.38 rows=264 width=40) (actual time=30777.117..49143.023 rows=259 loops=1) 
     -> Sort (cost=73673.65..75152.02 rows=591347 width=40) (actual time=30777.116..47352.373 rows=10891486 loops=1) 
      Sort Key: p.keyword_id, p.created_at DESC 
      Sort Method: external merge Disk: 512672kB 
      -> Merge Join (cost=219.59..812.26 rows=591347 width=40) (actual time=3.487..3827.028 rows=10891486 loops=1) 
        Merge Cond: (w.parent_id = k.website_id) 
        -> Nested Loop (cost=128.46..597.73 rows=1268 width=44) (actual time=3.378..108.915 rows=61582 loops=1) 
         -> Nested Loop (cost=2.28..39.86 rows=1 width=28) (actual time=0.026..0.216 rows=7 loops=1) 
           -> Index Scan using index_websites_on_parent_id on websites w (cost=0.14..15.08 rows=4 width=28) (actual time=0.004..0.023 rows=7 loops=1) 
            Index Cond: (parent_id IS NOT NULL) 
           -> Bitmap Heap Scan on accounts a (cost=2.15..6.18 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=7) 
            Recheck Cond: (id = w.account_id) 
            Filter: ((amount > '0'::numeric) AND (round((amount/(payment_renewal_period)::numeric), 2) >= '1'::numeric) AND (round((amount/(payment_renewal_period)::numeric), 2) <= '19'::numeric)) 
            Heap Blocks: exact=7 
            -> Bitmap Index Scan on accounts_pkey (cost=0.00..2.15 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=7) 
              Index Cond: (id = w.account_id) 
         -> Bitmap Heap Scan on positions p (cost=126.18..511.57 rows=4631 width=16) (actual time=0.994..8.226 rows=8797 loops=7) 
           Recheck Cond: (website_id = w.parent_id) 
           Heap Blocks: exact=1004 
           -> Bitmap Index Scan on index_positions_on_5_columns (cost=0.00..125.02 rows=4631 width=0) (actual time=0.965..0.965 rows=8797 loops=7) 
            Index Cond: (website_id = w.parent_id) 
        -> Sort (cost=18.26..18.92 rows=264 width=4) (actual time=0.106..1013.966 rows=10891487 loops=1) 
         Sort Key: k.website_id 
         Sort Method: quicksort Memory: 37kB 
         -> Seq Scan on keywords k (cost=0.00..7.64 rows=264 width=4) (actual time=0.005..0.039 rows=263 loops=1) 
    Planning time: 1.081 ms 
    Execution time: 49184.222 ms 

的事情是,當我運行查詢與w.id,而不是在w.parent_id加入位置部分總成本降至

Unique (cost=3621.07..3804.99 rows=264 width=40) (actual time=128.430..139.550 rows=259 loops=1) 
    -> Sort (cost=3621.07..3713.03 rows=36784 width=40) (actual time=128.429..135.444 rows=40385 loops=1) 
     Sort Key: p.keyword_id, p.created_at DESC 
     Sort Method: external sort Disk: 2000kB 
     -> Merge Join (cost=128.73..831.59 rows=36784 width=40) (actual time=25.521..63.299 rows=40385 loops=1) 
       Merge Cond: (k.website_id = w.id) 
       -> Index Only Scan using index_keywords_on_website_id_deleted_at on keywords k (cost=0.27..24.23 rows=264 width=4) (actual time=0.137..0.274 rows=263 loops=1) 
        Heap Fetches: 156 
       -> Materialize (cost=128.46..606.85 rows=1268 width=44) (actual time=3.772..49.587 rows=72242 loops=1) 
        -> Nested Loop (cost=128.46..603.68 rows=1268 width=44) (actual time=3.769..30.530 rows=61582 loops=1) 
          -> Nested Loop (cost=2.28..45.80 rows=1 width=32) (actual time=0.047..0.204 rows=7 loops=1) 
           -> Index Scan using websites_pkey on websites w (cost=0.14..21.03 rows=4 width=32) (actual time=0.007..0.026 rows=7 loops=1) 
             Filter: (parent_id IS NOT NULL) 
             Rows Removed by Filter: 4 
           -> Bitmap Heap Scan on accounts a (cost=2.15..6.18 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=7) 
             Recheck Cond: (id = w.account_id) 
             Filter: ((amount > '0'::numeric) AND (round((amount/(payment_renewal_period)::numeric), 2) >= '1'::numeric) AND (round((amount/(payment_renewal_period)::numeric), 2) <= '19'::numeric)) 
             Heap Blocks: exact=7 
             -> Bitmap Index Scan on accounts_pkey (cost=0.00..2.15 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=7) 
              Index Cond: (id = w.account_id) 
          -> Bitmap Heap Scan on positions p (cost=126.18..511.57 rows=4631 width=16) (actual time=0.930..2.341 rows=8797 loops=7) 
           Recheck Cond: (website_id = w.parent_id) 
           Heap Blocks: exact=1004 
           -> Bitmap Index Scan on index_positions_on_5_columns (cost=0.00..125.02 rows=4631 width=0) (actual time=0.906..0.906 rows=8797 loops=7) 
             Index Cond: (website_id = w.parent_id) 
Planning time: 1.124 ms 
Execution time: 157.167 ms 

Web上的索引上的位置的網站

Indexes: 
    "websites_pkey" PRIMARY KEY, btree (id) 
    "index_websites_on_account_id" btree (account_id) 
    "index_websites_on_deleted_at" btree (deleted_at) 
    "index_websites_on_domain_id" btree (domain_id) 
    "index_websites_on_parent_id" btree (parent_id) 
    "index_websites_on_processed_at" btree (processed_at) 

指標

Indexes: 
    "positions_pkey" PRIMARY KEY, btree (id) 
    "index_positions_on_5_columns" UNIQUE, btree (website_id, keyword_id, created_at, engine_id, region_id) 
    "overlap_index" btree (keyword_id, created_at) 
+0

請向我們展示使用'explain(analyze)'生成的執行計劃。 –

+0

使用parent_id屬性?我會嘗試但不知道查詢何時結束... – Robs

+0

@a_horse_with_no_name我已更新問題。另外,提供'解析分析'信息,但我必須在數據較少的分貝上執行此操作。 – Robs

回答

1

第二EXPLAIN輸出顯示超過200次的行數少,所以它是不足爲奇的排序要快得多。

您會注意到,在這兩種情況下,排序都會溢出到磁盤(Sort Method: external merge Disk: ...kB)。如果你可以通過提高work_mem來保持內存中的排序,那麼將會更快地得到很多
但是第一類太大了,你將無法適應內存。

思路,加快查詢:

  • (keyword_id, created_at)positions的索引。不知道這是否有幫助。

  • 執行濾波第一,像這樣:

    SELECT 
        a.id  AS account_id, 
        w.parent_id AS parent_id, 
        w.name  AS name, 
        p.position AS position 
    FROM (SELECT DISTINCT ON (keyword_id) 
         positions, 
         website_id, 
         keyword_id, 
         created_at 
         FROM positions 
         ORDER BY keyword_id, created_at DESC) p 
        JOIN ... 
    WHERE ... 
    ORDER BY p.keyword_id, p.created_at DESC; 
    

備註:DISTINCT ON是有些奇怪,因爲你不ORDER BYSELECT列表的值,所以結果值沒有明確界定。

+0

是的,我剛纔意識到這一點。不幸的是,正如你所寫的那樣,work_mem在這裏沒有任何意義(例子只是開發結果)。但有沒有機會調整一些額外/特定索引的查詢? – Robs

+0

擴展答案。 –

+0

感謝您的想法。 – Robs