2

我在Postgres有一張大桌子。Postgres分區修剪

表名是bigtable,列有:

integer |timestamp |xxx |xxx |...|xxx 
category_id|capture_time|col1|col2|...|colN 

我已經劃分的capture_time列CATEGORY_ID和日期部分的模10表。

的分區表是這樣的:

CREATE TABLE myschema.bigtable_d000h0(
    CHECK (category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable); 

CREATE TABLE myschema.bigtable_d000h1(
    CHECK (category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02') 
) INHERITS (myschema.bigtable); 

當我運行在where子句中使用CATEGORY_ID和capture_time查詢,預期分區不修剪。

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100; 

"Result (cost=0.00..9476.87 rows=1933 width=216)" 
" -> Append (cost=0.00..9476.87 rows=1933 width=216)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..1921.63 rows=1923 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h1 bigtable (cost=0.00..776.93 rows=1 width=218)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h2 bigtable (cost=0.00..974.47 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h3 bigtable (cost=0.00..1351.92 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h4 bigtable (cost=0.00..577.04 rows=1 width=217)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h5 bigtable (cost=0.00..360.67 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h6 bigtable (cost=0.00..1778.18 rows=1 width=214)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h7 bigtable (cost=0.00..315.82 rows=1 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h8 bigtable (cost=0.00..372.06 rows=1 width=219)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 
"  -> Seq Scan on bigtable_d000h9 bigtable (cost=0.00..1048.16 rows=1 width=215)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))" 

但是,如果我在where子句中添加的確切模標準(category_id%10=0),它完美

explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100 and category_id%10=0; 

"Result (cost=0.00..2154.09 rows=11 width=215)" 
" -> Append (cost=0.00..2154.09 rows=11 width=215)" 
"  -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))" 
"  -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..2154.09 rows=10 width=216)" 
"    Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))" 

有沒有什麼辦法讓分區修剪工作正常,而無需添加模每個查詢中的條件?

+0

?我認爲規劃師在9.x – 2012-04-03 17:31:16

+0

中進行了分區方面的一些改進。您可以使約束略少一點:'CHECK(category_id%10 = 1 AND date_trunc('month',capture_time)='2012-01 -01':: date)' – 2012-04-03 17:40:16

+0

@a_horse_with_no_name我正在使用9.1 – Dojo 2012-04-03 17:47:19

回答

1

對於任何人誰具有相同的問題: 我得出的結論是,從最簡單的方法是改變的查詢,包括您正在使用哪個版本的模條件category_id%10=0

4

事情是:用於排除約束PostgreSQL will create an implicit index。在你的情況下,這個索引將是一個部分,'因爲你在列上使用expresion,而不僅僅是它的價值。而且它在documentation規定(尋找11-2例):

PostgreSQL沒有複雜的理論校能夠識別那些形式不同但數學上等價的表達。 (不僅是這樣的一般定理證明者極難創建,它可能太慢而不能真正用到)。系統可以識別簡單的不等式含義,例如「x < 1」意味着「x < 2」; 否則謂詞條件必須與查詢的WHERE條件的一部分完全匹配,否則索引將不會被識別爲可用。匹配發生在查詢計劃時間,而不是在運行時。

因此,您的結果 - 您應該有創建CHECK約束時所使用的完全相同的表達式。

對於基於散列的分區我更喜歡2點的方法:

  • 添加可(在殼體10)取一組有限值中的一個字段,最好在由設計存在這樣一個;
  • 指定哈希範圍指定時間戳以同樣的方式範圍:MINVALUE < = CATEGORY_ID < MAXVALUE

此外,還可以創建一個2級分區:第一個層次,你

  • 根據category_id HASH創建10個分區;
  • 在第二級上,您可以根據日期範圍創建必要數量的分區。

儘管我總是試圖只使用1列進行分區,但更容易管理。

+0

感謝您的輸入。我發佈的代碼是使用1級繼承進行2級分區。性能方面,它比實際的2級繼承運行得更快。我知道它應該比另一種方式更快(您建議的方式),因爲在第一級檢查的表數量較少,而在下一級,只有從合格的第一級表繼承的表必須被掃描。但實際上它比較慢。 – Dojo 2012-04-08 06:24:00

+0

減慢分區的是分區修剪邏輯,而不是實際的表掃描。在這兩種情況下,優化器都會正確地修剪表,但是在2級繼承的情況下決定要修剪哪些分區需要更長的時間。 – Dojo 2012-04-08 06:27:40

+0

關於2級分區的一個有趣的事情是,您可以查詢一級表,現在修剪二級分區需要更少的時間。我可以使用它來以不妨礙性能的方式對數據進行存檔。 – Dojo 2012-04-08 06:31:43