Postgres分區修剪

我在Postgres有一張大桌子。Postgres分區修剪

表名是bigtable，列有：

integer |timestamp |xxx |xxx |...|xxx 
category_id|capture_time|col1|col2|...|colN

我已經劃分的capture_time列CATEGORY_ID和日期部分的模10表。

的分區表是這樣的：

CREATE TABLE myschema.bigtable_d000h0(
    CHECK (category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable); 

CREATE TABLE myschema.bigtable_d000h1(
    CHECK (category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable);

當我運行在where子句中使用CATEGORY_ID和capture_time查詢，預期分區不修剪。

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100; 

"Result (cost=0.00..9476.87 rows=1933 width=216)" 
" -> Append (cost=0.00..9476.87 rows=1933 width=216)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..1921.63 rows=1923 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h1 bigtable (cost=0.00..776.93 rows=1 width=218)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h2 bigtable (cost=0.00..974.47 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h3 bigtable (cost=0.00..1351.92 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h4 bigtable (cost=0.00..577.04 rows=1 width=217)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h5 bigtable (cost=0.00..360.67 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h6 bigtable (cost=0.00..1778.18 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h7 bigtable (cost=0.00..315.82 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h8 bigtable (cost=0.00..372.06 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h9 bigtable (cost=0.00..1048.16 rows=1 width=215)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

但是，如果我在where子句中添加的確切模標準（category_id%10=0），它完美

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100 and category_id%10=0; 

"Result (cost=0.00..2154.09 rows=11 width=215)" 
" -> Append (cost=0.00..2154.09 rows=11 width=215)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..2154.09 rows=10 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

有沒有什麼辦法讓分區修剪工作正常，而無需添加模每個查詢中的條件？

來源

2012-04-03 Dojo

？我認爲規劃師在9.x – 2012-04-03 17:31:16

中進行了分區方面的一些改進。您可以使約束略少一點：'CHECK（category_id％10 = 1 AND date_trunc（'month'，capture_time）='2012-01 -01':: date）' – 2012-04-03 17:40:16

@a_horse_with_no_name我正在使用9.1 – Dojo 2012-04-03 17:47:19

對於任何人誰具有相同的問題：我得出的結論是，從最簡單的方法是改變的查詢，包括您正在使用哪個版本的模條件category_id%10=0

來源

2012-04-08 06:38:33 Dojo

事情是：用於排除約束PostgreSQL will create an implicit index。在你的情況下，這個索引將是一個部分，'因爲你在列上使用expresion，而不僅僅是它的價值。而且它在documentation規定（尋找11-2例）：

PostgreSQL沒有複雜的理論校能夠識別那些形式不同但數學上等價的表達。（不僅是這樣的一般定理證明者極難創建，它可能太慢而不能真正用到）。系統可以識別簡單的不等式含義，例如「x < 1」意味着「x < 2」; 否則謂詞條件必須與查詢的WHERE條件的一部分完全匹配，否則索引將不會被識別爲可用。匹配發生在查詢計劃時間，而不是在運行時。

因此，您的結果 - 您應該有創建CHECK約束時所使用的完全相同的表達式。

對於基於散列的分區我更喜歡2點的方法：

添加可（在殼體10）取一組有限值中的一個字段，最好在由設計存在這樣一個;
指定哈希範圍指定時間戳以同樣的方式範圍：MINVALUE < = CATEGORY_ID < MAXVALUE

此外，還可以創建一個2級分區：第一個層次，你

根據category_id HASH創建10個分區;
在第二級上，您可以根據日期範圍創建必要數量的分區。

儘管我總是試圖只使用1列進行分區，但更容易管理。

來源

2012-04-03 19:57:02 vyegorov

感謝您的輸入。我發佈的代碼是使用1級繼承進行2級分區。性能方面，它比實際的2級繼承運行得更快。我知道它應該比另一種方式更快（您建議的方式），因爲在第一級檢查的表數量較少，而在下一級，只有從合格的第一級表繼承的表必須被掃描。但實際上它比較慢。 – Dojo 2012-04-08 06:24:00

減慢分區的是分區修剪邏輯，而不是實際的表掃描。在這兩種情況下，優化器都會正確地修剪表，但是在2級繼承的情況下決定要修剪哪些分區需要更長的時間。 – Dojo 2012-04-08 06:27:40

關於2級分區的一個有趣的事情是，您可以查詢一級表，現在修剪二級分區需要更少的時間。我可以使用它來以不妨礙性能的方式對數據進行存檔。 – Dojo 2012-04-08 06:31:43

Postgres分區修剪

回答

相關問題