指數布爾列在時間維度

在數據倉庫中我們的時間維度，我們有很多與布爾標誌列，例如：指數布爾列在時間維度

is_ytd（是年至今）
is_mtd（是月初至今）
is_current_date
is_current_month
is_current_year

在所有這樣的列上創建部分索引是否是一種很好的索引策略？例如：

CREATE INDEX tdim_is_current_month 
    ON calendar (is_current_month) 
    WHERE is_current_month;

我們的時間維度有136列，7 000行，53列與布爾指標。

爲什麼我們使用標誌而不是從current_date導出所需的日期範圍？

使生活更輕鬆
強制一致性
提速查詢
提供不那麼容易得出指標
利用其他工具更容易

Ad1）一旦你加入了time維度（並且這幾乎總是在分析數據倉庫中的任何事實表時），那麼它就非常重要sier要寫where is_current_year而不是where extract(year from time_date) = extract(year from current_date)

ad2）例如：確定今年迄今爲止（YTD）是多麼容易。我們可以從：time_date between date_trunc('year', current_date) and current_date開始。但有些人實際上排除current_date（這是有道理的，因爲今天還沒有完成）。在這種情況下，我們將使用：time_date between date_trunc('year', current_date) and (current_date - 1)。如果出於某種原因DW會在幾天內不更新，會發生什麼？也許你會希望將YTD鏈接到你有來自所有源系統的上一次完成數據的日子。當你對YTD的含義有共同的定義時，你會減少不同含義的風險。

廣告3）我認爲，根據動態計算表達式，根據列中的索引布爾標誌過濾數據比過濾應該更快。

廣告4）有些標誌不容易創建 - 例如我們有標誌is_first_workday_in_month, is_last_workday_in_month。

廣告5）在某些工具中，使用現有列比SQL表達式更容易。例如，在創建OLAP多維數據集維度時，將表列作爲層次結構級別添加比使用SQL表達式構建此級別要容易得多。

布爾標誌

測試指標我測試的所有索引標記，並用一個事實運行explain analyze爲simmple查詢表和時間維度（名爲日曆）：

select count(*) from fact_table join calendar using(time_key)

對於大多數的標誌，我得到索引掃描：

"Aggregate (cost=4022.80..4022.81 rows=1 width=0) (actual time=38.642..38.642 rows=1 loops=1)" 
" -> Hash Join (cost=13.12..4019.73 rows=1230 width=0) (actual time=38.640..38.640 rows=0 loops=1)" 
"  Hash Cond: (fact_table.time_key = calendar.time_key)" 
"  -> Seq Scan on fact_table (cost=0.00..3249.95 rows=198495 width=2) (actual time=0.006..17.769 rows=198495 loops=1)" 
"  -> Hash (cost=12.58..12.58 rows=43 width=2) (actual time=0.054..0.054 rows=43 loops=1)" 
"    Buckets: 1024 Batches: 1 Memory Usage: 2kB" 
"    -> Index Scan using cal_is_qtd on calendar (cost=0.00..12.58 rows=43 width=2) (actual time=0.014..0.049 rows=43 loops=1)" 
"     Index Cond: (is_qtd = true)" 
"Total runtime: 38.679 ms"

對於一些標誌我得到位圖堆掃描結合位圖inde X掃描：

"Aggregate (cost=13341.07..13341.08 rows=1 width=0) (actual time=100.972..100.973 rows=1 loops=1)" 
" -> Hash Join (cost=6656.54..13001.52 rows=135820 width=0) (actual time=5.729..86.972 rows=198495 loops=1)" 
"  Hash Cond: (fact_table.time_key = calendar.time_key)" 
"  -> Seq Scan on fact_table (cost=0.00..3249.95 rows=198495 width=2) (actual time=0.012..22.667 rows=198495 loops=1)" 
"  -> Hash (cost=6597.19..6597.19 rows=4748 width=2) (actual time=5.706..5.706 rows=4748 loops=1)" 
"    Buckets: 1024 Batches: 1 Memory Usage: 158kB" 
"    -> Bitmap Heap Scan on calendar (cost=97.05..6597.19 rows=4748 width=2) (actual time=0.440..4.971 rows=4748 loops=1)" 
"     Filter: is_past_quarter" 
"     -> Bitmap Index Scan on cal_is_past_quarter (cost=0.00..95.86 rows=3249 width=0) (actual time=0.395..0.395 rows=4748 loops=1)" 
"       Index Cond: (is_past_quarter = true)" 
"Total runtime: 101.013 ms"

只爲兩個標誌我得到序列掃描：

"Aggregate (cost=17195.33..17195.34 rows=1 width=0) (actual time=122.108..122.108 rows=1 loops=1)" 
" -> Hash Join (cost=9231.13..16699.10 rows=198495 width=0) (actual time=23.960..108.018 rows=198495 loops=1)" 
"  Hash Cond: (fact_table.time_key = calendar.time_key)" 
"  -> Seq Scan on fact_table (cost=0.00..3249.95 rows=198495 width=2) (actual time=0.012..22.153 rows=198495 loops=1)" 
"  -> Hash (cost=9144.39..9144.39 rows=6939 width=2) (actual time=23.935..23.935 rows=6939 loops=1)" 
"    Buckets: 1024 Batches: 1 Memory Usage: 231kB" 
"    -> Seq Scan on calendar (cost=0.00..9144.39 rows=6939 width=2) (actual time=17.427..22.908 rows=6939 loops=1)" 
"     Filter: is_eoq" 
"Total runtime: 122.138 ms"

來源

2014-02-11 Tomas Greif

好奇的設計。你不只是從系統時間中推導出當年的情況嗎？ –

@DavidAldridge增加了一些解釋爲什麼我們在時間維度中使用標誌。 –