2013-10-14 75 views
0

我有一個大的(近1000萬條記錄)的數據表,該表,性能方面的原因,有一個二次聚集同伴表。該彙總表定期與SOFAR未聚集的數據填充:如何優化GROUP BY計算字段(使用索引)?

REPLACE INTO aggregate (channel_id, type, timestamp, value, count) 
SELECT channel_id, 'day' AS type, MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count FROM data 
WHERE timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000 
AND timestamp >= IFNULL((SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"), 
    INTERVAL 1 day)) * 1000 FROM aggregate WHERE type = 'day'), 0) 
GROUP BY channel_id, YEAR(FROM_UNIXTIME(timestamp/1000)), DAYOFYEAR(FROM_UNIXTIME(timestamp/1000)); 

我發現,語句的SELECT部分非常慢(快PC上2+秒),甚至當返回任何數據。由於聚合需要在嵌入式設備上運行,這是一個值得關注的問題。這是計劃:

id select_type table  type  key  key_len rows Extra 
1 PRIMARY  data  ALL       9184560 Using where; Using temporary; Using filesort 
2 SUBQUERY aggregate index  ts_uniq 22  1940 Using where; Using index 

子查詢本身是即時的。顯然data不使用channel_id/timestamp指數由於GROUP BY子句中的計算:

CREATE TABLE `data` (
    `id` int(11) NOT NULL AUTO_INCREMENT, 
    `channel_id` int(11) DEFAULT NULL, 
    `timestamp` bigint(20) NOT NULL, 
    `value` double NOT NULL, 
    PRIMARY KEY (`id`), 
    UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`), 
    KEY `IDX_ADF3F36372F5A1AA` (`channel_id`) 
) ENGINE=MyISAM AUTO_INCREMENT=10432870 DEFAULT CHARSET=latin1; 

可以查詢得到進一步的優化?

更新:添加所需的信息

SHOW INDEXES FROM data; 

Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Null Index_type 
data 0   PRIMARY  1    id   A   9184560    BTREE  
data 0   ts_uniq  1    channel_id A   164   YES  BTREE  
data 0   ts_uniq  2    timestamp A   9184560    BTREE  
data 1   IDX_ADF3.. 1    channel_id A   164   YES  BTREE  

CREATE TABLE `aggregate` (
    `id` int(11) NOT NULL AUTO_INCREMENT, 
    `channel_id` int(11) NOT NULL, 
    `type` varchar(8) NOT NULL, 
    `timestamp` bigint(20) NOT NULL, 
    `value` double NOT NULL, 
    `count` int(11) NOT NULL, 
    PRIMARY KEY (`id`), 
    UNIQUE KEY `ts_uniq` (`channel_id`,`type`,`timestamp`) 
) ENGINE=MyISAM AUTO_INCREMENT=1941 DEFAULT CHARSET=latin1; 

我也注意到,更改GROUP BY到CHANNEL_ID,時間戳當查詢變得瞬間。不幸的是,將數據計算添加爲列是不可取的,因爲分組是動態計算的。

我無法理解爲什麼GROUP BY指數應該出現這樣的問題時,竟然沒有進行分組的任何數據。我試着運行

SELECT channel_id, 'day' AS type, MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count FROM data 
WHERE timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000 
AND timestamp >= IFNULL((SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"), INTERVAL 1 day)) * 1000 
    FROM aggregate WHERE type = 'day'), 0) 

所以GROUP似乎並不成爲問題這是一樣慢?

更新2

進一步挖掘這條道路表明,

SELECT channel_id, 'day' AS type, timestamp, value, 1 FROM data 
WHERE timestamp >= (SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"), 
    INTERVAL 1 day)) * 1000 FROM aggregate WHERE type = 'day'); 

仍然緩慢(1.4sec) - 所以沒有GROUP BY問題都沒有。

更新3

,這仍是緩慢:

SELECT channel_id, 'day' AS type, timestamp, value, 1 FROM data WHERE timestamp >= 1380837600000; 

所以 - 問題是,內部比較是時間戳不能利用CHANNEL_ID的,時間戳指數雖然說是GROUP BY條款的一部分。 這導致如何強制該索引的問題?

+0

,你會介意粘貼'aggragate'表格製作嗎? – VancleiP

+0

另外,您可以使用SHOW INDEXES FROM數據檢查所有索引是否正確使用; ...我有懷疑('channel_id','timestamp')唯一鍵... – VancleiP

+0

你可以嘗試用'GROUP BY channel_id,DATE(FROM_UNIXTIME(timestamp/1000))'替換'GROUP BY channel_id,YEAR(FROM_UNIXTIME(timestamp/1000)),DAYOFYEAR(FROM_UNIXTIME(timestamp/1000));' –

回答

1

添加了一年DAYOFYEAR列數據表,以及對(CHANNEL_ID,一年,DAYOFYEAR)的索引。插入一行時填充兩個新列。

+1

不幸的是不是一個選項 - 顯然也不是問題。 – andig