2012-08-24 78 views
-1

我一直在摔跤與以下查詢(和其他一些類似的),我覺得我失去了一些東西,或者我使用的是錯誤類型的數據庫或其他東西。如何優化這個怪物查詢

該查詢用於獲取過去10年中新電影的總數以及每年在英國與特定城鎮停止播放(關閉)的電影總數。多年來,這些查詢也爲許多鄉鎮運行。

其他查詢會做類似的事情,有時候會在最後添加一個UNION ALL到一個查詢來獲取打開或關閉的記錄年份。

也有查詢的月度數據和季度數據,而不是年度數據來看,和其中一些只是比較歷史打開/特定的四分之一(例如Q3)或月(如3月),關閉。

這裏是一個將在2012年比較英國倫敦查詢:

SELECT inc.opening_year as year, inc.number_of_films as opens, 
    diss.number_of_films as closures, inc.uk_films as uk_opens, 
    diss.uk_films as uk_closures 
FROM 
(SELECT film_db.opening_year, uk.number_of_films as uk_films, 
     COUNT(film_db.id_film_db) as number_of_films 
    FROM film_db 
    JOIN postcodes ON id_postcodes = opening_postcode_id 
    JOIN towns ON id_towns = town_id AND town = 'London' 
    JOIN (SELECT opening_year, COUNT(film_db.id_film_db) as number_of_films 
      FROM film_db 
      WHERE opening_year <= 2012 AND opening_year >= (2012 - 10) 
      GROUP BY opening_year 
     ) uk ON uk.opening_year = film_db.opening_year 
    WHERE film_db.opening_year <= 2012 AND film_db.opening_year >= (2012 - 10) 
    GROUP BY film_db.opening_year 
    ORDER BY film_db.opening_year DESC 
) inc 
JOIN 
(SELECT film_db.closing_year, uk.number_of_films as uk_films, 
     COUNT(film_db.id_film_db) as number_of_films 
    FROM film_db 
    JOIN postcodes ON id_postcodes = postcode_id 
    JOIN towns ON id_towns = town_id AND town = 'London' 
    JOIN (SELECT closing_year, COUNT(film_db.id_film_db) as number_of_films 
      FROM film_db 
      WHERE film_db.closing_year <= 2012 AND film_db.closing_year >= (2012 - 10) 
      GROUP BY film_db.closing_year 
     ) uk ON uk.closing_year = film_db.closing_year 
    WHERE film_db.closing_year <= 2012 AND film_db.closing_year >= (2012 - 10) 
    GROUP BY film_db.closing_year 
    ORDER BY film_db.closing_year DESC 
) diss ON diss.closing_year = inc.opening_year 

的DB SHOW CREATE TABLE輸出如下:

film_db:

CREATE TABLE `film_db` (
    `id_film_db` int(11) NOT NULL AUTO_INCREMENT, 
    `film_name` varchar(255) DEFAULT NULL, 
    `category` varchar(100) DEFAULT NULL, 
    `status` varchar(50) DEFAULT NULL, 
    `opening_date` date DEFAULT NULL, 
    `opening_year` int(4) DEFAULT NULL, 
    `opening_month` int(2) DEFAULT NULL, 
    `opening_quarter` int(1) DEFAULT NULL, 
    `closing_date` date DEFAULT NULL, 
    `closing_year` int(4) DEFAULT NULL, 
    `closing_month` int(2) DEFAULT NULL, 
    `closing_quarter` int(1) DEFAULT NULL, 
    `datetime` timestamp NULL DEFAULT CURRENT_TIMESTAMP, 
    `postcode_id` int(4) NOT NULL DEFAULT '0', 
    `opening_postcode_id` int(4) NOT NULL DEFAULT '0', 
    PRIMARY KEY (`id_film_db`), 
    KEY `opening_date` (`opening_date`), 
    KEY `status` (`status`), 
    KEY `postcode_id` (`postcode_id`), 
    KEY `type` (`category`), 
    KEY `opening_year` (`opening_year`), 
    KEY `opening_month` (`opening_month`,`opening_year`) USING BTREE, 
    KEY `opening_quarter` (`opening_quarter`,`opening_year`) USING BTREE, 
    KEY `closing_year` (`closing_year`), 
    KEY `closing_month` (`closing_year`,`closing_month`), 
    KEY `closing_quarter` (`closing_year`,`closing_quarter`), 
    KEY `closing_date` (`closing_date`), 
    KEY `opening_closing_date` (`opening_date`,`closing_date`), 
    KEY `opening_postcode` (`opening_postcode_id`), 
    FULLTEXT KEY `film_name` (`film_name`) 
) ENGINE=MyISAM AUTO_INCREMENT=10649173 DEFAULT CHARSET=utf8 

郵政編碼:

CREATE TABLE `postcodes` (
    `id_postcodes` int(4) NOT NULL AUTO_INCREMENT, 
    `postcode` varchar(9) NOT NULL, 
    `town_id` int(4) NOT NULL, 
    `lat` float NOT NULL, 
    `lng` float NOT NULL, 
    PRIMARY KEY (`id_postcodes`), 
    UNIQUE KEY `postcode` (`postcode`) USING BTREE, 
    KEY `town` (`town_id`) 
) ENGINE=MyISAM AUTO_INCREMENT=5705 DEFAULT CHARSET=latin1 

鎮:

CREATE TABLE `towns` (
    `id_towns` int(4) NOT NULL AUTO_INCREMENT, 
    `town` varchar(150) NOT NULL, 
    `county_id` int(3) NOT NULL, 
    PRIMARY KEY (`id_towns`), 
    KEY `county` (`county_id`) 
) ENGINE=MyISAM AUTO_INCREMENT=1606 DEFAULT CHARSET=latin1 

這裏是EXPLAIN EXTENDED輸出:

1 PRIMARY <derived2>  ALL                              11  100 
1 PRIMARY <derived4>  ALL                              11  100  Using where; Using join buffer 
4 DERIVED <derived5>  ALL                              11  100  Using where; Using temporary; Using filesort 
4 DERIVED film_db   ref  postcode_id,closing_year,closing_month,closing_quarter closing_year 5 uk.closing_year      2  100  Using where 
4 DERIVED postcodes  eq_ref PRIMARY,town           PRIMARY   4 film_db.postcode_id     1  100 
4 DERIVED towns   eq_ref PRIMARY             PRIMARY   4 postcodes.town_id     1  100  Using where 
5 DERIVED film_db   ALL  closing_year,closing_month,closing_quarter                  9895680 47.66 Using where; Using temporary; Using filesort 
2 DERIVED <derived3>  ALL                              11  100  Using where; Using temporary; Using filesort 
2 DERIVED film_db   ref  opening_year,opening_postcode       opening_year 5 uk.opening_year      3  100  Using where 
2 DERIVED postcodes  eq_ref PRIMARY,town           PRIMARY   4 film_db.opening_postcode_id   1  100 
2 DERIVED towns   eq_ref PRIMARY             PRIMARY   4 postcodes.town_id     1  100  Using where 
3 DERIVED film_db   ALL  opening_year                         9895680 54.53 Using where; Using temporary; Using filesort 

正如你所看到的,MySQL不認爲過濾的film_db表將使任何性能差異,所以它不使用任何鍵。

所以:

我可以提高此查詢使用索引的更好?

我可以提高索引,使查詢運行得更快?

是否有其他類型的數據庫(MySQL的沒有),我應該使用,而不是對這種查詢的,在這裏我在複雜的條件下計算的條目數最感興趣並加入?

+0

這是什麼?我建議你創建['sqlfiddle'](http://sqlfiddle.com)。 – diEcho

+0

我沒有創建一個包含10000000行的sqlfiddle ...我只是試圖提供所有我認爲會有幫助的信息。 – Jon

+2

只需用上面的查詢創建表格和必要的虛擬數據 – diEcho

回答

1

這是我想嘗試的第一件事:

CREATE TABLE opens 
SELECT opening_year, COUNT(film_db.id_film_db) as number_of_films 
FROM film_db 
WHERE opening_year <= 2012 AND opening_year >= (2012 - 10) 
GROUP BY opening_year 

CREATE TABLE closures 
SELECT closing_year, COUNT(film_db.id_film_db) as number_of_films 
FROM film_db 
WHERE film_db.closing_year <= 2012 AND film_db.closing_year >= (2012 - 10) 
GROUP BY film_db.closing_year 

我會用的,而不是你現在正在使用的子查詢這兩個表。

其他查詢做類似的事情,有時在查詢結束時添加一個UNION ALL到一個查詢,打開或關閉的記錄年。 也有查詢的月度數據和季度數據,而不是年度數據,以及其中一些只是比較歷史打開/關閉特定的四分之一(例如Q3)或月(如3月),該運行。

我想你更頻繁地運行這些選擇,然後打開/關閉表的內容會改變。因此,每次運行這樣的查詢時都不需要重新生成這些表格。


我可以提高此查詢使用索引的更好? 我可以改進索引以便查詢運行得更快嗎? 是否有另一種數據庫類型(不是MySQL),我應該使用這種查詢方式,而我最感興趣的是計算具有複雜條件和聯接的條目數量?

當然還有許多其他可能的改進。當然應該有一種方法讓MySQL使用索引。您應該注意,數據庫引擎不能合併單獨的索引,也就是說,在這種情況下,opening_postcode_id上的索引和opening_year上的索引不能組合。我想不通爲什麼用都沒有,但我可以肯定的告訴大家,像這兩個指標將改善查詢

KEY `opening_year_postcode` (`opening_year`, `opening_postcode_id`) 
KEY `closing_year_postcode` (`closing_year`, `postcode_id`) 

看到這個蘇答案https://stackoverflow.com/a/6295744/176569


我學到了多年來,這種性能調整是一個漸進的過程。你必須嘗試更多的技巧,評估性能增益,最後你將只應用一個或兩個。

在這一點上,我不會考慮將MySQL放到其他數據庫供應商。你的性能問題的原因可能不是MySQL。