2011-06-22 110 views
4

下面是兩個幾乎相同的postgres查詢,但產生的查詢計劃和執行時間差別很大。我假設第一個查詢很快,因爲form_id ='W40'只有196個form_instance記錄,而form_id ='W30L'則有7000個。但爲什麼從200到7000的記錄跳躍(對我來說似乎相對較小)會導致查詢時間的驚人增加?我試圖以各種方式對數據進行索引以加快速度,但基本上已經過時了。我如何加快速度? (請注意,兩個表的模式都包含在底部)。postgres查詢優化問題

explain analyze select form_id,form_instance_id,answer,field_id 
from form_instances,field_instances 
where workflow_state = 'DRqueued' and form_instance_id = form_instances.id 
and field_id in ('Book_EstimatedDueDate','H_SubmittedDate','H_Ccode','miscarriage','miscarriage_of_multiple','stillbirth','AP_IUFD_of_multiple','maternal_death','birth_includes_transport','newborn_death','H_Pid','H_Mid1','H_Mid2','H_Mid3') 
and (form_id = 'W40'); 

QUERY PLAN                                 
Nested Loop (cost=0.00..70736.14 rows=4646 width=29) (actual time=0.000..20.000 rows=2399 loops=1) 
    -> Index Scan using form_id_and_workflow_state on form_instances (cost=0.00..1041.42 rows=507 width=8) (actual time=0.000..0.000 rows=196 loops=1) 
     Index Cond: (((form_id)::text = 'W40'::text) AND ((workflow_state)::text = 'DRqueued'::text)) 
    -> Index Scan using index_field_instances_on_form_instance_id on field_instances (cost=0.00..137.25 rows=17 width=25) (actual time=0.000..0.102 rows=12 loops=196) 
     Index Cond: (field_instances.form_instance_id = form_instances.id) 
     Filter: ((field_instances.field_id)::text = ANY ('{Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2,H_Mid3}'::text[])) 
Total runtime: 30.000 ms 
(7 rows) 

explain analyze select form_id,form_instance_id,answer,field_id 
from form_instances,field_instances 
where workflow_state = 'DRqueued' and form_instance_id = form_instances.id 
and field_id in ('Book_EstimatedDueDate','H_SubmittedDate','H_Ccode','miscarriage','miscarriage_of_multiple','stillbirth','AP_IUFD_of_multiple','maternal_death','birth_includes_transport','newborn_death','H_Pid','H_Mid1','H_Mid2','H_Mid3') 
and (form_id = 'W30L'); 

QUERY PLAN                                
Hash Join (cost=34300.46..160865.40 rows=31045 width=29) (actual time=65670.000..74960.000 rows=102777 loops=1) 
    Hash Cond: (field_instances.form_instance_id = form_instances.id) 
    -> Bitmap Heap Scan on field_instances (cost=29232.57..152163.82 rows=531718 width=25) (actual time=64660.000..72800.000 rows=526842 loops=1) 
     Recheck Cond: ((field_id)::text = ANY ('{Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2,H_Mid3}'::text[])) 
     -> Bitmap Index Scan on index_field_instances_on_field_id (cost=0.00..29099.64 rows=531718 width=0) (actual time=64630.000..64630.000 rows=594515 loops=1) 
       Index Cond: ((field_id)::text = ANY ('{Book_EstimatedDueDate,H_SubmittedDate,H_Ccode,miscarriage,miscarriage_of_multiple,stillbirth,AP_IUFD_of_multiple,maternal_death,birth_includes_transport,newborn_death,H_Pid,H_Mid1,H_Mid2,H_Mid3}'::text[])) 
    -> Hash (cost=5025.54..5025.54 rows=3388 width=8) (actual time=980.000..980.000 rows=10457 loops=1) 
     -> Bitmap Heap Scan on form_instances (cost=90.99..5025.54 rows=3388 width=8) (actual time=10.000..950.000 rows=10457 loops=1) 
       Recheck Cond: (((form_id)::text = 'W30L'::text) AND ((workflow_state)::text = 'DRqueued'::text)) 
       -> Bitmap Index Scan on form_id_and_workflow_state (cost=0.00..90.14 rows=3388 width=0) (actual time=0.000..0.000 rows=10457 loops=1) 
        Index Cond: (((form_id)::text = 'W30L'::text) AND ((workflow_state)::text = 'DRqueued'::text)) 
Total runtime: 75080.000 ms 

# \d form_instances          Table "public.form_instances"  Column  |   Type    |       Modifiers       
-----------------+-----------------------------+------------------------------------------------------------- 
id    | integer      | not null default nextval('form_instances_id_seq'::regclass) 
form_id   | character varying(255)  | 
created_at  | timestamp without time zone | 
updated_at  | timestamp without time zone | 
created_by_id | integer      | 
updated_by_id | integer      | 
workflow  | character varying(255)  | 
workflow_state | character varying(255)  | 
validation_data | text      | 
Indexes: 
    "form_instances_pkey" PRIMARY KEY, btree (id) 
    "form_id_and_workflow_state" btree (form_id, workflow_state) 
    "index_form_instances_on_form_id" btree (form_id) 
    "index_form_instances_on_workflow_state" btree (workflow_state) 

# \d field_instances 
             Table "public.field_instances" 
     Column  |   Type    |       Modifiers       
------------------+-----------------------------+-------------------------------------------------------------- 
id    | integer      | not null default nextval('field_instances_id_seq'::regclass) 
form_instance_id | integer      | 
created_at  | timestamp without time zone | 
updated_at  | timestamp without time zone | 
created_by_id | integer      | 
updated_by_id | integer      | 
field_id   | character varying(255)  | 
answer   | text      | 
state   | character varying(255)  | 
explanation  | text      | 
idx    | integer      | not null default 0 
Indexes: 
    "field_instances_pkey" PRIMARY KEY, btree (id) 
    "field_instances__lower_answer" btree (lower(answer)) 
    "index_field_instances_on_answer" btree (answer) 
    "index_field_instances_on_field_id" btree (field_id) 
    "index_field_instances_on_field_id_and_answer" btree (field_id, answer) 
    "index_field_instances_on_form_instance_id" btree (form_instance_id) 
    "index_field_instances_on_idx" btree (idx) 
+5

嗯,肯定的有多少行可能是系統的估計是斷我們可以看到,在第二個查詢它估計從位圖索引掃描3388行,但實際上得到10457.你可能想'真空全分析;'和/或'reindex'和/或'cluster',看看有什麼用處。 –

+0

我做了一個「真空全分析」,它沒有做任何事情,但「reindex」做了很大的改變。謝謝。 – zippy

+0

你是什麼版本的PG? – Kuberchaun

回答

1

此前有評論,但由於它似乎已經解決了這個問題,我將推廣到一個實際的答案。

該系統的的多少行有可能是估計是關閉的。我們可以看到,在第二個查詢它估計從位圖索引掃描3388行,但實際上得到10457.

所以,你可能想vacuum full analyze;

另外其他的命令,從而極大地幫助包括reindex和/或cluster

OP表示,vacuum沒有幫助,但reindex做到了。

+1

雖然它確實有效。奇怪的是,它甚至選擇了相同的查詢計劃,並且仍然錯誤地估計了行值。 - >位圖堆掃描上form_instances(成本= 92.09..3597.16行= 3496寬度= 8)(實際時間= 60.000..100.000行= 10462個環路= 1) 所以我不知道爲什麼這個工作。 – zippy

+0

「vacuum full」與這個問題無關,在大多數情況下,它實際上會使性能變差。有關詳細信息,請參閱http://wiki.postgresql.org/wiki/VACUUM_FULL。所有需要更新的統計數據都是一個簡單的「分析」。 –

+0

@Greg:您的陳述既真實又毫不相關。維基頁面正在討論您在執行操作時的性能以及更新/寫入性能 - 它應該始終嚴格提高只讀性能(缺少索引問題)。指數沒有更新的陳述同樣如此,但是如果你注意到我確實推薦了一個reindex,並且實際上解決了他的問題(不管真空)。然而,它總是一個很好的提醒,如果不理解其含義,就不應該使用「全真空」。儘管如此,大多數命令都可以這麼說。 –

1

我不知道哪裏在摘要的數字從何而來,因爲您發佈的第二查詢計劃輸出102777行,而第一個是輸出2399行。這是行數的43倍,所以事實上,一個非常不同的查詢計劃被挑選出來並不奇怪。至於爲什麼運行時差異甚至比這更大,優化程序在估計form_id和workflow_state上的篩選器的敏感度方面出現中等錯誤。您可能需要增加此數據庫的default_statistics_target值,並再次運行ANALYZE,如果您使用的PostgreSQL 8.3版本的默認值非常低,則尤其如此。有關該參數的更多信息,請參閱Tuning Your PostgreSQL Server

它很可能在兩者之間的差別是如此之大,僅僅是因爲所有回答小查詢所需的數據已經坐在內存,而較大的一個涉及到更多的磁盤訪問回答。如果在將數據讀入緩存後運行時間得到改善,則多次運行每個查詢可能會提供一些信息。你所做的REINDEX可以將索引縮小到足以在兩種情況下都適合緩存,從而解決現在的問題。儘管如此,該指數可能會再次「臃腫」。