我在嘗試加速只有200萬行大約需要11秒的查詢時遇到問題。 Here is a link to my sqlfiddle。這裏是我試圖運行的聲明和我的EXPLAIN聲明。如何加快帶有多個連接的Group By語句?
查詢:
SELECT crawl.pk Pk,domains.domain Domain,
CONCAT(schemes.scheme, "://", domains.domain, remainders.remainder) Uri,
crawl.redirect Redirect FROM crawl
LEFT JOIN dates ON crawl.date_crawled=dates.pk
LEFT JOIN schemes ON crawl.scheme=schemes.pk
LEFT JOIN domains ON crawl.domain=domains.pk
LEFT JOIN remainders ON crawl.remainder=remainders.pk
WHERE (dates.date < CURDATE() - INTERVAL 30 DAY)
AND crawl.redirect=0
GROUP BY crawl.domain
ORDER BY crawl.date_crawled ASC
LIMIT 50
說明:
+----+-------------+------------+--------+-----------------------+-----------------------+---------+----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------+-----------------------+---------+----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | dates | ALL | PRIMARY,date | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | crawl | ref | date_crawled_redirect | date_crawled_redirect | 8 | mytable.dates.pk,const | 408644 | |
| 1 | SIMPLE | schemes | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.scheme | 1 | |
| 1 | SIMPLE | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | SIMPLE | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
+----+-------------+------------+--------+-----------------------+-----------------------+---------+----------------------------+--------+----------------------------------------------+
5 rows in set (2.26 sec)
編輯#1:每評論 正如我已經更換了左加入W /加入並通過加入移動日期過濾器。可悲的是,這並沒有縮短查詢時間。
SELECT crawl.pk Pk,domains.domain Domain, CONCAT(schemes.scheme, "://", domains.domain, remainders.remainder) Uri, crawl.redirect Redirect
FROM crawl
JOIN schemes ON crawl.scheme=schemes.pk
JOIN domains ON crawl.domain=domains.pk
JOIN remainders ON crawl.remainder=remainders.pk
JOIN dates ON crawl.date_crawled=dates.pk AND dates.date < CURDATE() - INTERVAL 30 DAY
WHERE crawl.redirect=0
GROUP BY crawl.domain
ORDER BY crawl.date_crawled ASC
LIMIT 50
EDIT#2: 我更新說明:
+----+-------------+------------+--------+---------------------------------------------------------+-----------------------+---------+----------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------------------------------------------------+-----------------------+---------+----------------------------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | dates | range | PRIMARY,date,date_pk,dateBtreeIdx,pk | date_pk | 3 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | crawl | ref | domain_remainder,remainder,scheme,date_crawled_redirect | date_crawled_redirect | 8 | mytable.dates.pk,const | 408644 | |
| 1 | SIMPLE | schemes | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using where; Using join buffer |
| 1 | SIMPLE | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | SIMPLE | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
+----+-------------+------------+--------+---------------------------------------------------------+-----------------------+---------+----------------------------+--------+-----------------------------------------------------------+
EDIT#3
+----+--------------------+------------+-----------------+------------------------------------------+---------+---------+----------------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-----------------+------------------------------------------+---------+---------+----------------------------+---------+---------------------------------+
| 1 | PRIMARY | schemes | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using temporary; Using filesort |
| 1 | PRIMARY | crawl | ref | domain_remainder,remainder,scheme,domain | scheme | 4 | mytable.schemes.pk | 1448223 | Using where |
| 1 | PRIMARY | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | PRIMARY | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
| 2 | DEPENDENT SUBQUERY | dates | unique_subquery | PRIMARY,date,date_pk,dateBtreeIdx,pk | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+------------+-----------------+------------------------------------------+---------+---------+----------------------------+---------+---------------------------------+
5 rows in set (0.04 sec)
EDIT#4:
+----+-------------+------------+--------+--------------------------------------+-------------------------+---------+----------------------------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------------------+-------------------------+---------+----------------------------+---------+-----------------------------------------------------------+
| 1 | SIMPLE | dates | range | PRIMARY,date,date_pk,dateBtreeIdx,pk | date_pk | 3 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | schemes | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using join buffer |
| 1 | SIMPLE | crawl | ref | scheme_domain_remainder | scheme_domain_remainder | 4 | mytable.schemes.pk | 1455517 | Using where |
| 1 | SIMPLE | domains | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.domain | 1 | |
| 1 | SIMPLE | remainders | eq_ref | PRIMARY | PRIMARY | 4 | mytable.crawl.remainder | 1 | |
+----+-------------+------------+--------+--------------------------------------+-------------------------+---------+----------------------------+---------+-----------------------------------------------------------+
5 rows in set (0.04 sec)
EDIT#5
SELECT urls.pk PK, domains.domain Domain, CONCAT(schemes.scheme, "://", domains.domain, remainders.remainder) Uri, urls.redirect Redirect, urls.date_crawled DC FROM
(SELECT * FROM (
SELECT * FROM crawl as urls ORDER BY date_crawled ASC
) AS tmp GROUP BY tmp.domain) as urls
JOIN schemes ON urls.scheme=schemes.pk
JOIN domains ON urls.domain=domains.pk
JOIN remainders ON urls.remainder=remainders.pk
JOIN dates ON urls.date_crawled=dates.pk AND dates.date < CURDATE() - INTERVAL 30 DAY
WHERE urls.redirect=0
ORDER BY urls.date_crawled ASC
LIMIT 50
只是爲了確認..你要超過30天的所有記錄???或者你的意思是>在過去30天內獲取所有記錄。您的索引以其他方式顯示正常 – DRapp 2014-11-09 01:15:05
50個超過30天的記錄。 – 2014-11-09 01:44:08
你需要所有這些的左連接嗎?並嘗試將條件放在WHERE中作爲你自己的聯接的一部分。 – 2014-11-09 02:09:57