PostgreSQL - 使用索引非常緩慢地獲取

我在Centos 6.7上運行postgresql 9.4。其中一個表中包含大量的數百萬條記錄，這是DDL：PostgreSQL - 使用索引非常緩慢地獲取

CREATE TABLE domain.examples (
    id SERIAL, 
    sentence VARCHAR, 
    product_id BIGINT, 
    site_id INTEGER, 
    time_stamp BIGINT, 
    category_id INTEGER, 
    CONSTRAINT examples_pkey PRIMARY KEY(id) 
) 
WITH (oids = false); 

CREATE INDEX examples_categories ON domain.examples 
    USING btree (category_id); 

CREATE INDEX examples_site_idx ON domain.examples 
    USING btree (site_id);

消耗數據的應用程序做，使用分頁，所以我們取的1000條記錄散貨。但是，即使通過索引列提取，獲取時間也非常緩慢：

explain analyze 
select * 
from domain.examples e 
where e.category_id = 105154 
order by id asc 
limit 1000; 

Limit (cost=0.57..331453.23 rows=1000 width=280) (actual time=2248261.276..2248296.600 rows=1000 loops=1) 
    -> Index Scan using examples_pkey on examples e (cost=0.57..486638470.34 rows=1468199 width=280) (actual time=2248261.269..2248293.705 rows=1000 loops=1) 
     Filter: (category_id = 105154) 
     Rows Removed by Filter: 173306740 
Planning time: 70.821 ms 
Execution time: 2248328.457 ms

什麼導致查詢速度慢？以及如何改進？

謝謝！

來源

2017-02-08 Seffy

是所有那些'_id'列應該是外鍵？他們似乎沒有被宣佈爲這樣。 '句子'中的東西有多大？有可能你的緩存很冷，或者服務器的磁盤超載。再試一次。 – Schwern

他們是否應該提高性能？如果會被宣佈爲這樣？該提取僅來自該表，不涉及參與。 '句子'是非常短的字符串，也一次又一次地查詢性能相同的結果。 – Seffy

你有沒有有效的統計數據？ - >>'VACUUM ANALYZE domain.examples;'BTW是'e.category_id'低基數列？ – wildplasser

您可以創建在這兩個領域指數CATEGORY_ID和ID：

CREATE INDEX examples_site_idx2 ON domain.examples 
    USING btree (category_id, id);

我試着解釋一下分析與查詢與300萬行。

隨着舊索引：

                QUERY PLAN                 
---------------------------------------------------------------------------------------------------------------------------------------------- 
Limit (cost=0.43..9234.56 rows=1000 width=60) (actual time=0.655..597.193 rows=322 loops=1) 
    -> Index Scan using examples_pkey on examples e (cost=0.43..138512.43 rows=15000 width=60) (actual time=0.654..597.142 rows=322 loops=1) 
     Filter: (category_id = 105154) 
     Rows Removed by Filter: 2999678 
Planning time: 2.295 ms 
Execution time: 597.257 ms 
(6 rows)

有了新的指標：

                QUERY PLAN                  
------------------------------------------------------------------------------------------------------------------------------------------------- 
Limit (cost=0.43..2585.13 rows=1000 width=60) (actual time=0.027..28.814 rows=322 loops=1) 
    -> Index Scan using examples_site_idx2 on examples e (cost=0.43..38770.93 rows=15000 width=60) (actual time=0.026..28.777 rows=322 loops=1) 
     Index Cond: (category_id = 105154) 
Planning time: 1.471 ms 
Execution time: 28.860 ms 
(5 rows)

來源

2017-02-09 08:06:17

謝謝！作爲一種魅力工作:-) – Seffy

這是不是你想要的計劃，PostgreSQL是掃描整個指數examples_pkey與條件category_id = 105154過濾掉的記錄，你可以嘗試獲得更好的統計在表ANALYZE或播放與系統GUCs（我真的不建議）讓計劃者選擇正確的索引。

或者，如果category_id = 105154的行數不是太高，我建議首先使用CTE，這樣策劃者就不得不使用examples_categories索引;

with favorite_category as (
    select * 
    from domain.examples e 
    where e.category_id = 105154) 
select * 
from favorite_category 
order by id asc 
limit 1000;

這將category_id = 105154獲取所有記錄，並通過ID做一個在內存中的排序（如果那個取大小小於你的工作記憶，show work_mem;一看就知道是什麼。默認爲4MB）。

來源

2017-02-08 21:59:28

對於Postgres，CTE通常*不是解決方案性能問題。 – wildplasser

@wildplasser我同意！但在這種情況下，我認爲它是有道理的，如果category_id = 105154的行數很少。另外，我很好奇，你能舉一個例子來支持你的說法嗎？我不是說我不同意！ –

在這種情況下，沒有CTE的查詢可能會受益甚多。（CTE是優化者的障礙）但是，在這種情況下，它可能沒有什麼區別。 – wildplasser

PostgreSQL - 使用索引非常緩慢地獲取

回答

相關問題