2016-03-05 107 views
0

我有大約4000萬行(GPS跟蹤器位置)的巨大表,從公司內部的多個設備每10秒記錄一次。我只想選擇每分鐘的第一行,所以我使用了group by。問題在於桌子每隔10秒就會長大,我已經嘗試了幾乎所有的東西,搜索了好幾個小時。所以我決定提出一個問題。mysql巨大的表查詢優化組

我使用MySQL 5.7.11 InnoDB池50GB,服務器是至強X5650 64GB RAM。

表結構:

CREATE TABLE `eventData` (
    `id` bigint(20) NOT NULL, 
    `position` point NOT NULL, 
    `speed` decimal(6,2) DEFAULT NULL, 
    `time` datetime DEFAULT NULL, 
    `device_id` int(9) DEFAULT NULL, 
    `processed` tinyint(1) NOT NULL DEFAULT '0', 
    `time_m` datetime GENERATED ALWAYS AS ((`time` - interval second(`time`) second)) VIRTUAL 
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci ROW_FORMAT=DYNAMIC; 

ALTER TABLE `eventData` 
    ADD PRIMARY KEY (`id`), 
    ADD KEY `time` (`time`), 
    ADD KEY `device_id` (`device_id`,`processed`), 
    ADD KEY `time_m` (`time_m`); 

SQL:

SELECT e.time, e.time_m, X(e.position) AS lat, Y(e.position) AS lng 
FROM eventData AS e 
WHERE 
    e.device_id = 86 AND 
    e.time BETWEEN '2016-02-29' AND '2016-03-06' 
    GROUP BY DAY(e.time),HOUR(e.time),MINUTE(e.time); 

解釋:

EXPLAIN SELECT e.time, e.time_m, X(e.position) AS lat, Y(e.position) AS lng FROM eventData AS e WHERE e.device_id = 86 AND e.time BETWEEN '2016-02-29' AND '2016-03-06' GROUP BY DAY(e.time),HOUR(e.time),MINUTE(e.time); 
+----+-------------+-------+------------+------+----------------+-----------+---------+-------+---------+----------+---------------------------------------------------------------------+ 
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref | rows | filtered | Extra                | 
+----+-------------+-------+------------+------+----------------+-----------+---------+-------+---------+----------+---------------------------------------------------------------------+ 
| 1 | SIMPLE  | e  | NULL  | ref | time,device_id | device_id | 5  | const | 2122632 |  6.40 | Using index condition; Using where; Using temporary; Using filesort | 
+----+-------------+-------+------------+------+----------------+-----------+---------+-------+---------+----------+---------------------------------------------------------------------+ 

描述:

DESCRIBE eventData; 
+------------------+------------------------+------+-----+---------+-------------------+ 
| Field   | Type     | Null | Key | Default | Extra    | 
+------------------+------------------------+------+-----+---------+-------------------+ 
| id    | bigint(20)    | NO | PRI | NULL | auto_increment | 
| position   | point     | NO |  | NULL |     | 
| speed   | decimal(6,2)   | YES |  | NULL |     | 
| time    | datetime    | YES | MUL | NULL |     | 
| device_id  | int(9)     | YES | MUL | NULL |     | 
| processed  | tinyint(1)    | NO |  | 0  |     | 
| time_m   | datetime    | YES | MUL | NULL | VIRTUAL GENERATED | 
+------------------+------------------------+------+-----+---------+-------------------+ 

我已經試過:

通過
  • 沒有組:〜0.06S
  • 組按天,小時,分鐘:〜4.76s
  • 組由虛擬列(time_m):〜4.92s
  • 分組由e.time DIV 500:〜5.02s

我需要達到比5秒更好的結果。請幫忙。

+0

性能問題應該包括'EXPLAIN ANALYZE'和有關表大小的一些信息,指數,當前時間表現,慾望時間等。「慢」是一個相對項,我們需要一個真實值進行比較。 [MySQL](http://dba.stackexchange.com/questions/15371/how-do-i-get-the-execution-plan-for-a-view) –

+0

你好,我已經包括解釋和描述表,分析表命令說OK。謝謝。 – Tomas

回答

1

您可以對錶格進行分區。例如按年份。由於索引小得多,這會顯着提高性能。 如果這不可能在您的環境中使用,請嘗試使用

GROUP BY date_format(e.time,'%d%H%i');

0

1)你可以嘗試綜合指數(DEVICE_ID,時間)

2)儘量按虛擬領域:

SELECT MIN(e.time), e.time_m, X(e.position) AS lat, Y(e.position) AS lng 
FROM eventData AS e 
WHERE 
    e.device_id = 86 AND 
    e.time BETWEEN '2016-02-29' AND '2016-03-06' 
    GROUP BY e.time_m;