爲count（*）查詢性能設計sql，索引

Hello all :)我正在構建一個工具來對我們的Oracle 10g數據庫執行一些卷採樣。這裏是查詢：爲count（*）查詢性能設計sql，索引

SELECT count(*) 
FROM product 
JOIN customer ON product.CUSTOMER_ID = customer.ID 
WHERE 
( product.CATEGORY = 'some first category criteria' 
    AND customer.REGION = 'some first region criteria' 
    AND ...) 
OR 
( product.CATEGORY = 'some second category criteria' 
    AND customer.REGION = 'some second region criteria' 
    AND ...) 
OR ...

我只需要從這個查詢做計數。問題在於數據量很大：每張桌子上大約有三千萬行，我希望這個查詢能夠做出反應。

到目前爲止，在customer (<search criteria column>, CUSTOMER_ID)上有複合索引幫助了很多。我認爲它是幫助Oracle在索引過濾操作之後去JOIN。

每個(... AND ... AND ...)塊預計包含大約50 000行。搜索標準中使用的列都具有大小約爲1000個值的集合中的值。

我想知道我可以實現什麼方法，因爲我只做count(*) s，尤其是因爲Oracle有內置的OLAP模塊（和一個CUBE操作？）。另外，我確信通過深思熟慮的索引和提示可以改進很多事情。

您會如何設計？

來源

2013-06-04 BenoitParis

指標是不是免費的。我不會在這些大型表上添加新的索引，只是爲了支持您的計數應用程序。此外，這些計數有多新鮮？ – tbone

@tbone兩個colomns的數據最多隻能每天刷新一次。所以一些預先計算可以在晚上進行。 – BenoitParis

那可能就是你的答案了。 Precalc使用簡單的物化視圖來保存您需要的計數。然後將您的應用指向墊子視圖，並且每天在幾小時內刷新它。 – tbone

這看起來像一個很好的候選人bitmap indexes：

位圖索引主要設計用於數據倉庫或在這一個特設時尚查詢引用的列環境。可能需要位圖索引的情況包括：

索引列具有較低的基數，即與表格行數相比，不同值的數量較小。

索引表是或者是隻讀的，或者不受DML語句修改的重要修改。

具體來說，位圖連接索引在這裏可能比較理想。手冊中的示例甚至可以匹配您的數據模型。我試圖在下面重新創建模型和數據，並且位圖連接索引似乎比其他解決方案快幾個數量級。

的樣本數據

--Create tables 
create table customer 
(
    customer_id number, 
    region  varchar2(100) not null 
) nologging; 

create table product 
(
    product_id number, 
    customer_id number not null, 
    category varchar2(100) not null 
) nologging; 


--Load 30M rows, 1M rows at a time. Takes about 6 minutes. 
begin 
    for i in 1 .. 30 loop 
     insert /*+ append */ into customer 
     select (1000000*i)+level, 'Region '||trunc(dbms_random.value(1, 1000)) 
     from dual connect by level <= 1000000; 
     commit; 

     insert /*+ append */ into product 
     select (1000000*i)+level, (1000000*i)+level 
      ,'Category '||trunc(dbms_random.value(1, 1000)) 
     from dual connect by level <= 1000000; 
     commit; 
    end loop; 
end; 
/

--Add primary keys and foreign key constraints. 
alter table customer add constraint customer_pk primary key (customer_id); 
alter table product add constraint product_pk primary key (product_id); 
alter table product add constraint product_customer_fk 
    foreign key (customer_id) references customer(customer_id); 

--Gather stats 
begin 
    dbms_stats.gather_table_stats(user, 'CUSTOMER'); 
    dbms_stats.gather_table_stats(user, 'PRODUCT'); 
end; 
/

未編入索引 - 慢

正如預期的那樣，表現糟糕。此示例查詢在我的計算機上需要大約75秒。

SELECT count(*) 
FROM product 
JOIN customer ON product.CUSTOMER_ID = customer.customer_id 
WHERE (product.CATEGORY = 'Category 1' AND customer.REGION = 'Region 1') 
OR (product.CATEGORY = 'Category 2' AND customer.REGION = 'Region 2') 
OR (product.CATEGORY = 'Category 888' AND customer.REGION = 'Region 888');

B樹索引 - 仍然緩慢

該計劃的變化，但性能保持不變。我想這可能是因爲我的例子是一種最糟糕的索引方案，數據是真正的隨機數據。

create index customer_idx on customer(region); 
create index product_idx on product(category); 

begin 
    dbms_stats.gather_table_stats(user, 'CUSTOMER'); 
    dbms_stats.gather_table_stats(user, 'PRODUCT'); 
end; 
/

位圖索引 - 好一點

這提高了性能一點，約61秒。

drop index customer_idx; 
drop index product_idx; 

create bitmap index customer_bidx on customer(region); 
create bitmap index product_bidx on product(category); 

begin 
    dbms_stats.gather_table_stats(user, 'CUSTOMER'); 
    dbms_stats.gather_table_stats(user, 'PRODUCT'); 
end; 
/

位圖連接索引 - 令人難以置信的快速

現在查詢返回的結果幾乎是瞬間，我的IDE數了0秒。

drop index customer_idx; 
drop index product_idx; 

create bitmap index customer_product_bjix 
on product(product.category, customer.region) 
FROM product, customer 
where product.CUSTOMER_ID = customer.customer_id; 

begin 
    dbms_stats.gather_table_stats(user, 'CUSTOMER'); 
    dbms_stats.gather_table_stats(user, 'PRODUCT'); 
end; 
/

指數花費

位圖連接索引需要一點時間來創造比B樹或位圖索引。與位圖或位圖連接索引相比，b-tree索引非常大。

select segment_name, bytes/1024/1024 MB 
from dba_segments 
where segment_name in ('CUSTOMER_IDX', 'PRODUCT_IDX' 
    ,'CUSTOMER_BIDX', 'PRODUCT_BIDX', 'CUSTOMER_PRODUCT_BJIX'); 


SEGMENT_NAME   MB 
------------   -- 
CUSTOMER_IDX   726 
PRODUCT_IDX    792 
CUSTOMER_BIDX   88 
PRODUCT_BIDX    96 
CUSTOMER_PRODUCT_BJIX 184

查詢風格

這不會影響性能，但是你可以縮小你的查詢是這樣的：

SELECT count(*) 
FROM product 
JOIN customer ON product.CUSTOMER_ID = customer.customer_id 
WHERE (product.category, customer.region) 
    in (('Category 1', 'Region 1'), 
     ('Category 2', 'Region 2'), 
     ('Category 888', 'Region 888'));

來源

2013-06-07 04:51:36

我認爲您只是在考慮查詢的性能。對於即使是中等DML活動的表格，位圖通常也是壞消息。海報沒有透露的是該公司如何使用該表（不僅僅是這種特定的需求）。我看到有太多的表格有很多索引（位圖和其他），因爲大多數開發者只考慮他們自己的直接需求（在添加它們之前，公司幾乎沒有完整的檢查）。無論如何要考慮一下。 – tbone

@tbone你是對的，位圖索引和DML有問題。根據評論「兩個colomns的數據最多隻能每天刷新一次」，應該有可能建立一個過程來避免這些問題。這可能與刪除索引，修改表格，然後重新創建索引一樣簡單。 –

我認爲他指的是計數的新鮮度。我懷疑他打的桌子是經常使用的關鍵桌子，並且有很高的DML活動。無論如何，我認爲在這一點上我太在意了;-) – tbone

爲count（*）查詢性能設計sql，索引

回答

相關問題