我在AWS m4.large(2個vCPU,8 GB RAM)上運行,我看到有關MySQL和GROUPBY的稍微令人驚訝的行爲。我有這樣的測試數據庫:`MySQL GROUP BY在使用索引時速度較慢
CREATE TABLE demo (
time INT,
word VARCHAR(30),
count INT
);
CREATE INDEX timeword_idx ON demo(time, word);
我插入400萬條記錄與(均勻)隨機單詞"t%s" % random.randint(0, 30000)
和時間random.randint(0, 86400)
。
SELECT word, time, sum(count) FROM demo GROUP BY time, word;
3996922 rows in set (1 min 28.29 sec)
EXPLAIN SELECT word, time, sum(count) FROM demo GROUP BY time, word;
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------+
| 1 | SIMPLE | demo | index | NULL | timeword_idx | 38 | NULL | 4002267 | |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------+
,然後我不使用索引:
SELECT word, time, sum(count) FROM demo IGNORE INDEX (timeword_idx) GROUP BY time, word;
3996922 rows in set (34.75 sec)
EXPLAIN SELECT word, time, sum(count) FROM demo IGNORE INDEX (timeword_idx) GROUP BY time, word;
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | demo | ALL | NULL | NULL | NULL | NULL | 4002267 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
你可以通過查詢所花費的3倍多的時間指數看。我沒有那麼驚訝,因爲通過使用索引查詢可能必須避免閱讀time
和word
列,但不幸的是索引非常稀疏,它不應該獲得太多。相反,當檢索count
時,它將直接掃描轉換爲隨機訪問模式。
我只想確認這是原因,並想知道是否有一個「緊湊的規則」時和索引會最終導致更糟糕的性能時使用GROUP BY。
編輯:
我跟着戈登·利諾夫答案,並用:
SELECT word, time, sum(count) FROM demo GROUP BY time, word;
3996922 rows in set (3.36 sec)
EXPLAIN SELECT word, time, sum(count) FROM demo GROUP BY time, word;
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------------+
| 1 | SIMPLE | demo | index | NULL | timeword_idx | 43 | NULL | 4002267 | Using index |
+----+-------------+-------+-------+---------------+--------------+---------+------+---------+-------------+
:
CREATE INDEX timeword_idx ON demo(time, word, count);
的 「覆蓋索引」 當與全掃描相比,計算出結果快10倍非常令人印象深刻!
關於使用索引的「緊湊規則」的另一部分是關於限制需要訪問的行數的謂詞(條件),以及MySQL是否可以有效使用索引範圍掃描操作。如果必須訪問表中的每個*行,並且查詢不使用「覆蓋」索引,則需要查找基礎表中的頁面。這就像訪問索引中的每個*塊一次,並且多次訪問表中的每個*塊。如果這是InnoDB表,沒有主鍵或唯一索引,則集羣鍵是內部rowID。 +10 – spencer7593
「覆蓋指數」給出了驚人的結果。更新了問題以顯示它們。 – neverlastn