我有一個大的(近1000萬條記錄)的數據表,該表,性能方面的原因,有一個二次聚集同伴表。該彙總表定期與SOFAR未聚集的數據填充:如何優化GROUP BY計算字段(使用索引)?
REPLACE INTO aggregate (channel_id, type, timestamp, value, count)
SELECT channel_id, 'day' AS type, MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count FROM data
WHERE timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000
AND timestamp >= IFNULL((SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"),
INTERVAL 1 day)) * 1000 FROM aggregate WHERE type = 'day'), 0)
GROUP BY channel_id, YEAR(FROM_UNIXTIME(timestamp/1000)), DAYOFYEAR(FROM_UNIXTIME(timestamp/1000));
我發現,語句的SELECT
部分非常慢(快PC上2+秒),甚至當返回任何數據。由於聚合需要在嵌入式設備上運行,這是一個值得關注的問題。這是計劃:
id select_type table type key key_len rows Extra
1 PRIMARY data ALL 9184560 Using where; Using temporary; Using filesort
2 SUBQUERY aggregate index ts_uniq 22 1940 Using where; Using index
子查詢本身是即時的。顯然data
不使用channel_id/timestamp
指數由於GROUP BY
子句中的計算:
CREATE TABLE `data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_ADF3F36372F5A1AA` (`channel_id`)
) ENGINE=MyISAM AUTO_INCREMENT=10432870 DEFAULT CHARSET=latin1;
可以查詢得到進一步的優化?
更新:添加所需的信息
SHOW INDEXES FROM data;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Null Index_type
data 0 PRIMARY 1 id A 9184560 BTREE
data 0 ts_uniq 1 channel_id A 164 YES BTREE
data 0 ts_uniq 2 timestamp A 9184560 BTREE
data 1 IDX_ADF3.. 1 channel_id A 164 YES BTREE
CREATE TABLE `aggregate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) NOT NULL,
`type` varchar(8) NOT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`type`,`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=1941 DEFAULT CHARSET=latin1;
我也注意到,更改GROUP BY到CHANNEL_ID,時間戳當查詢變得瞬間。不幸的是,將數據計算添加爲列是不可取的,因爲分組是動態計算的。
我無法理解爲什麼GROUP BY
指數應該出現這樣的問題時,竟然沒有進行分組的任何數據。我試着運行
SELECT channel_id, 'day' AS type, MAX(timestamp) AS timestamp, SUM(value) AS value, COUNT(timestamp) AS count FROM data
WHERE timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000
AND timestamp >= IFNULL((SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"), INTERVAL 1 day)) * 1000
FROM aggregate WHERE type = 'day'), 0)
所以GROUP
似乎並不成爲問題這是一樣慢?
更新2
進一步挖掘這條道路表明,
SELECT channel_id, 'day' AS type, timestamp, value, 1 FROM data
WHERE timestamp >= (SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp)/1000, "%Y-%m-%d"),
INTERVAL 1 day)) * 1000 FROM aggregate WHERE type = 'day');
仍然緩慢(1.4sec) - 所以沒有GROUP BY
問題都沒有。
更新3
,這仍是緩慢:
SELECT channel_id, 'day' AS type, timestamp, value, 1 FROM data WHERE timestamp >= 1380837600000;
所以 - 問題是,內部比較是時間戳不能利用CHANNEL_ID的,時間戳指數雖然說是GROUP BY
條款的一部分。 這導致如何強制該索引的問題?
,你會介意粘貼'aggragate'表格製作嗎? – VancleiP
另外,您可以使用SHOW INDEXES FROM數據檢查所有索引是否正確使用; ...我有懷疑('channel_id','timestamp')唯一鍵... – VancleiP
你可以嘗試用'GROUP BY channel_id,DATE(FROM_UNIXTIME(timestamp/1000))'替換'GROUP BY channel_id,YEAR(FROM_UNIXTIME(timestamp/1000)),DAYOFYEAR(FROM_UNIXTIME(timestamp/1000));' –