2012-07-06 37 views
0

我正在運行查詢,通過日期範圍搜索連接多個表,並試圖找出如何進一步優化它。mysql - 避免由於索引無效造成的filesort ..?

SELECT ACC.name AS account_name, CAMP.account_id AS account_id,CAMP.name AS campaign_name,CAMP.id AS campaign_id,ADG.id AS adgroup_id,ADG.name AS adgroup_name,KW.text AS keyword_name, 
SUM(SPENT.billed_clicks) AS billed_clicks,KW.id AS keyword_id,KW.status_id AS status_id FROM account ACC, campaign CAMP,adgroup ADG,adgroup_keyword KW INNER JOIN keyword_spent SPENT 
ON KW.id = SPENT.keyword_id WHERE  summary_date >= '2012-03-01' AND summary_date <= '2012-03-04' AND KW.adgroup_id = ADG.id AND ADG.campaign_id = CAMP.id AND CAMP.account_id = ACC.id 
GROUP BY keyword_id 

關於這個EXPLAIN產生下面 - 它

+----+-------------+-------+--------+----------------------------+--------------+---------+---------------------------------+--------+----------------------------------------------+ 
| id | select_type | table | type | possible_keys    | key   | key_len | ref        | rows | Extra          | 
+----+-------------+-------+--------+----------------------------+--------------+---------+---------------------------------+--------+----------------------------------------------+ 
| 1 | SIMPLE  | SPENT | range | summary_date    | summary_date | 3  | NULL       | 752191 | Using where; Using temporary; Using filesort | 
| 1 | SIMPLE  | KW | eq_ref | PRIMARY,FK1948D0E6ED3A5544 | PRIMARY  | 8  | clicksummarydb.SPENT.keyword_id |  1 |            | 
| 1 | SIMPLE  | ADG | eq_ref | PRIMARY,FKBBC2083C29112FD0 | PRIMARY  | 8  | advertisedb.KW.adgroup_id  |  1 |            | 
| 1 | SIMPLE  | CAMP | eq_ref | PRIMARY,FKF7A90110246F33C4 | PRIMARY  | 8  | advertisedb.ADG.campaign_id  |  1 |            | 
| 1 | SIMPLE  | ACC | eq_ref | PRIMARY     | PRIMARY  | 8  | advertisedb.CAMP.account_id  |  1 |            | 
+----+-------------+-------+--------+----------------------------+--------------+---------+---------------------------------+--------+----------------------------------------------+ 

的keyword_spent表包含了超過150萬行,這裏是展示創建表

| keyword_spent | CREATE TABLE `keyword_spent` (
    `id` bigint(20) NOT NULL auto_increment, 
    `summary_date` date NOT NULL, 
    `adgroup_id` bigint(20) NOT NULL, 
    `keyword_id` bigint(20) NOT NULL, 
    `billed_clicks` int(11) default NULL, 
    `un_billed_clicks` int(11) default NULL, 
    `spent` decimal(20,5) default NULL, 
    `last_click_recno` bigint(20) default NULL, 
    `campaign_id` bigint(20) NOT NULL, 
    `account_id` bigint(20) NOT NULL, 
    `total_convs` bigint(20) unsigned default '0', 
    PRIMARY KEY (`id`), 
    UNIQUE KEY `keyword_spent_uniq` (`summary_date`,`adgroup_id`,`keyword_id`), 
    KEY `idx_account_id` (`account_id`), 
    KEY `idx_kw_id` (`keyword_id`), 
    KEY `adgroup_id` (`adgroup_id`), 
    KEY `campaign_id` (`campaign_id`), 
    KEY `summary_date` (`summary_date`) 
) ENGINE=InnoDB DEFAULT CHARSET=latin1 | 

我不不明白爲什麼在該日期範圍內沒有超過100,000條記錄時,有近750,000行正在被掃描。

此外,它爲什麼做一個filesort而不是使用索引。 ?

+1

要做的第一件事就是擺脫A,B,C的東西,並對每一個使用inner join,所以where子句就在summary_date上。 – 2012-07-06 17:50:49

+0

@Tony:我同意。我更喜歡使用逗號樣式join sytnax的'JOIN ... ON'語法。順便說一下...'summary_date'上的謂詞可以很容易地包含在JOIN的ON子句中,不必位於WHERE子句中,根本就不需要任何WHERE子句。 – spencer7593 2012-07-06 18:09:08

+0

我重新安排了這兩個建議的查詢。只是好奇,除了看起來更清潔,它是否會給予任何性能提升。 – 2012-07-06 18:15:00

回答

1

文件排序不一定是壞的。如Baron Schwartz's blog post所示,文件排序不一定是關於文件。這只是在沒有可用索引時使用的快速排序。

作爲一個想法如何優化:也許所有的聚合數據都在它自己的子查詢中,並加入這些數據?我在想這樣的事情(根據需要進行調整):

SELECT ACC.name AS account_name, 
CAMP.account_id AS account_id, 
CAMP.name AS campaign_name, 
CAMP.id AS campaign_id, 
ADG.id AS adgroup_id, 
ADG.name AS adgroup_name, 
KW.text AS keyword_name, 
KW.id AS keyword_id, 
JOINED.billed_clicks AS billed_clicks, 
JOINED.un_billed_clicks AS un_billed_clicks, 
JOINED.total_clicks AS total_clicks, 
JOINED.spent AS spent, 
JOINED.total_convs AS total_convs 
FROM account ACC 
INNER JOIN campaign CAMP ON ACC.id = CAMP.account_id 
INNER JOIN adgroup ADG ON CAMP.id = ADG.campaign_id 
INNER JOIN adgroup_keyword KW ON ADG.id = KW.adgroup_id 
INNER JOIN (SELECT 
    SUM(billed_clicks) AS billed_clicks, 
    SUM(un_billed_clicks) AS un_billed_clicks, 
    SUM(billed_clicks) + SUM(un_billed_clicks) AS total_clicks, 
    SUM(spent) AS spent, 
    SUM(total_convs) AS total_convs, 
    id AS keyword_id 
    FROM keyword_spent 
    GROUP BY keyword_id 
) JOINED ON JOINED.keyword_id = KW.id 

希望我對此有所瞭解。這種解決方案有一個好處:分組/聚合保持分離,您不必擔心在原始示例中從未做過的其他列的分組。

+0

這是一個有趣的想法。將嘗試一下。 – 2012-07-06 18:05:25

+0

@ Wolfmann2000,使用你的查詢模型,並在(keyword_Id,summary_date)上創建一個複合索引,將執行時間從35秒減少到僅僅6秒。所以,謝謝你。然而,我無法理解這個查詢在執行方面與我的版本不同,除了重新安排。你能幫我理解嗎? ?事實上,您的查詢的EXPLAIN顯示我額外的掃描,所以我很困惑。 – 2012-07-09 16:44:12

2

嘗試對所有在連接謂詞引用的列的索引:

CREATE INDEX keyword_spent_IX2 ON keyword_spent (keyword_id, summary_date) 

- 或 -

CREATE INDEX keyword_spent_IX3 ON keyword_spent (summary_date, keyword_id) 

- 或者你甚至可以創建一個包含所有的覆蓋索引在查詢中引用的列:

CREATE INDEX keyword_spent_IX4 ON keyword_spent (keyword_id, summary_date, 
    billed_clicks, un_billed_clicks, spent, total_convs) 

filesort操作可能歸因於GROUP BY。

我的首選是使用JOIN ... ON語法,而不是老派的逗號,並在WHERE子句中混合連接謂詞。

FROM account ACC 
    JOIN campaign CAMP ON CAMP.account_id = ACC.id 
    JOIN adgroup ADG ON ADG.campaign_id = CAMP.id 
    JOIN adgroup_keyword KW ON KW.adgroup_id = ADG.id 
    JOIN keyword_spent SPENT ON SPENT.keyword_id = KW.id 
WHERE SPENT.summary_date >= '2012-03-01' 
    AND SPENT.summary_date <= '2012-03-04' 
GROUP BY SPENT.id 

您正在按SELECT列表中的非聚合的子集進行分組。其他大多數關係型數據庫管理系統都會在這方面拋出異常; MySQL更自由。

+0

啊..我沒有看到keyword_id必須被索引。我會嘗試一下,看看它是如何發展的。 – 2012-07-06 18:05:01

+0

@Anand:如果您想避免使用filesort,請嘗試使用「GROUP BY」列作爲主要列的索引。 – spencer7593 2012-07-06 18:06:04

+0

我創建了一個索引(keyword_id,summary_date),因爲我的原始查詢按keyword_id分組。它沒有擺脫filesort。 – 2012-07-06 18:19:15

1

嘗試使用summary_date的索引(它在where中使用),然後使用keyword_id;並明確移動JOIN內的日期範圍:

ON (SPENT.id = KW.id AND SPENT.summary_date BETWEEN ... AND ...) 

另外,請嘗試創建一個VIEW,爲您提供SPENT上的聚合字段。理想情況下,優化程序應該更好地理解這一點,併爲您節省一些時間。

CREATE VIEW SPENT AS SELECT 
    keyword_id, 
    SUM(SPENT.billed_clicks) AS billed_clicks, 
    SUM(SPENT.un_billed_clicks) AS un_billed_clicks, 
    SUM(SPENT.spent) AS spent, 
    SUM(SPENT.total_convs) AS total_convs 
FROM keyword_spent GROUP BY keyword_id; 

這需要一個關於keyword_id first和summary_date秒的索引,並且具有VIEW的JOIN應該等同於100,000行的SELECT。

相關問題