2015-03-02 42 views
4

時區轉換和組MySQL查詢這是我的表在MySQL 5.5含30萬條記錄優化具有小時

CREATE TABLE `campaign_logs` (
    `domain` varchar(50) DEFAULT NULL, 
    `campaign_id` varchar(50) DEFAULT NULL, 
    `subscriber_id` varchar(50) DEFAULT NULL, 
    `message` varchar(21000) DEFAULT NULL, 
    `log_time` datetime DEFAULT NULL, 
    `log_type` varchar(50) DEFAULT NULL, 
    `level` varchar(50) DEFAULT NULL, 
    `campaign_name` varchar(500) DEFAULT NULL, 
    KEY `subscriber_id_index` (`subscriber_id`), 
    KEY `log_type_index` (`log_type`), 
    KEY `log_time_index` (`log_time`), 
    KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`), 
    KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 

在下面的查詢,我在做GROUP BY小時相對於時區

QUERY

SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_OPENED' 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_SENT' 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_CLICKED' 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date; 

成績

上面的查詢將會給這樣

+---------------+-------+----------------+-------------+ 
| EMAIL_CLICKED | 1 AM |    71 |   83 | 
| EMAIL_CLICKED | 1 PM |    25 |   27 | 
| EMAIL_SENT | 10 AM |    51 |   59 | 
| EMAIL_OPENED | 10 PM |    16 |   18 | 

這是上面的查詢

的解釋結果EXPLAIN

+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+ 
| id | select_type | table   | type | possible_keys        | key          | key_len | ref | rows | Extra         | 
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+ 
| 1 | PRIMARY  | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468  | NULL | 55074 | Using where; Using index; Using filesort | 
| 2 | UNION  | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468  | NULL | 330578 | Using where; Using index; Using filesort | 
| 3 | UNION  | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468  | NULL | 1589 | Using where; Using index; Using filesort | 
|NULL| UNION RESULT | <union1,2,3> | ALL | NULL          | NULL          | NULL | NULL | NULL |           | 
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+ 

優化?

我們在此表上有一個覆蓋索引。

此查詢花費很長時間(超過1分鐘)。

如果我從查詢中刪除distinct_count(subscriber_id),那麼我們在1.5秒內得到結果,但是我需要查詢中的subscriber_iddistinct_count

有沒有什麼辦法可以優化這個查詢?

感謝

+0

嘗試使用'GROUP BY log_type,log_time' – LeGEC 2015-03-02 11:38:30

+0

@LeGEC感謝您的評論,我需要按小時分組,如果我按log_time分組,則不會給出所需的輸出。 – Rams 2015-03-02 11:43:32

+0

如果將查詢限制爲一種日誌類型,這會如何影響性能?如果刪除不同的計數,這會如何影響性能? – 2015-03-02 11:47:30

回答

3

你不處理數據量巨大,所以group by不宜服用40秒 - 假設你是不是有很多的鎖活動的桌子上一個非常繁忙的服務器上。

試試這個版本的查詢(限一log_type)的:

SELECT log_type, 
     DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time, 
     count(DISTINCT subscriber_id) AS distinct_count, 
     count(subscriber_id) AS total_count 
FROM stats.campaign_logs 
WHERE DOMAIN = 'xxxx' AND 
     campaign_id='1234' AND 
     log_type = 'EMAIL_SENT' AND 
     log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30') 
GROUP BY time; 

這應該優化使用索引。如果速度很快,那麼請使用union all將這些行放在一起。醜陋,但有時由於索引優化,有時union allOR/IN快得多。

+0

Hi @Gordon Linoff,感謝您的回覆,我更新了我的查詢,並更新了我的問題,請檢查一次 – Rams 2015-03-17 02:35:22

+0

嗨@Gordon,我根據您的建議更新了我的查詢,但仍然查詢需要很長時間才能獲得結果。我從表中刪除了campaign_id和domain_id索引,因爲我有以domain_id和campaign_id開頭的複合索引 – Rams 2015-03-17 03:30:01

-1
SELECT 
    log_type 
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date 
    ,count(*) AS total 
    ,count(DISTINCT subscriber_id) d 
FROM 
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE 
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type IN ('EMAIL_OPENED','EMAIL_SENT','EMAIL_CLICKED') 
    AND log_time BETWEEN 
     CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND 
     CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30') 
GROUP BY log_date, log_type 

如果我理解正確,可以解決您的問題嗎?