2017-03-05 154 views
0

我一直試圖優化一些基於連接表比嵌套查詢效率更高的SQL查詢。我多次加入同一張表以對數據執行不同的分析。優化MySQL - JOIN與嵌套查詢

我有2個表:

交易:

id | date_add | merchant_ id | transaction_type  |  amount 
1   1488733332   108     add      20.00 
2   1488733550   108     remove     5.00 

,並僅列出日期,以便我可以創建空的記錄,其中有特定天數沒有交易日曆表:

日曆:

id  | datefield 
1   2017-03-01 
2   2017-03-02 
3   2017-03-03 
4   2017-03-04 

我有幾千行s,並且我試圖得到每月總交易和不同類型交易(即總計12行)的年度總結,其中

  • transactions =所有「金額」的總和,
  • 加法=全 「量」 和,其中TRANSACTION_TYPE = 「添加」
  • 贖回=全 「量」 和,其中TRANSACTION_TYPE = 「去除」

結果:

month  | transactions  | additions | redemptions 
Jan    15     12    3 
Feb    20     15    5 
... 

我的初始查詢看起來是這樣的:

SELECT COALESCE(tr.transactions, 0) AS transactions, 
     COALESCE(ad.additions, 0) AS additions, 
     COALESCE(re.redemptions, 0) AS redemptions, 
     calendar.date 
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date FROM calendar WHERE datefield LIKE '2017-%' GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar 
LEFT JOIN (SELECT COUNT(transaction_type) as transactions, from_unixtime(date_add, '%b %Y') as date_t FROM transactions WHERE merchant_id = 108 GROUP BY from_unixtime(date_add, '%b %Y')) AS tr 
ON calendar.date = tr.date_t 
LEFT JOIN (SELECT COUNT(transaction_type = 'add') as additions, from_unixtime(date_add, '%b %Y') as date_a FROM transactions WHERE merchant_id = 108 AND transaction_type = 'add' GROUP BY from_unixtime(date_add, '%b %Y')) AS ad 
ON calendar.date = ad.date_a 
LEFT JOIN (SELECT COUNT(transaction_type = 'remove') as redemptions, from_unixtime(date_add, '%b %Y') as date_r FROM transactions WHERE merchant_id = 108 AND transaction_type = 'remove' GROUP BY from_unixtime(date_add, '%b %Y')) AS re 
ON calendar.date = re.date_r 

我試圖優化和清除它一點點,除去那些語句,以及與此想出了:

SELECT 
    DATE_FORMAT(cal.datefield, '%b %d') as date, 
    IFNULL(count(ct.amount),0) as transactions, 
    IFNULL(count(a.amount),0) as additions, 
    IFNULL(count(r.amount),0) as redeptions 
FROM calendar as cal 
LEFT JOIN transactions as ct ON cal.datefield = date(from_unixtime(ct.date_add)) && ct.merchant_id = 108 
LEFT JOIN transactions as r ON r.id = ct.id && r.transaction_type = 'remove' 
LEFT JOIN transactions as a ON a.id = ct.id && a.transaction_type = 'add' 
WHERE cal.datefield like '2017-%' 
GROUP BY month(cal.datefield) 

我很驚訝地看到,修改後的聲明比我的數據集慢了大約20倍。我錯過了某種邏輯嗎?考慮到我多次加入同一個表格,是否有更好的方法可以通過更簡化的查詢來實現相同的結果?

編輯: 因此,爲了進一步解釋我正在尋找的結果 - 我想爲每年的每個月份(12行)設置一行,每行都包含總交易量,總增加量和總計每月贖回。

第一個查詢我在0.5秒內得到一個結果,但第二個查詢結果是9.5秒。

+0

你可以添加一個解釋和優化後的結果和非優化的查詢? –

+0

我真的不會爲這個 – Strawberry

+0

日曆表打擾我看到你在ON語句中的第二個查詢中使用了&&'LEFT JOIN?他們應該是'AND' –

回答

3

展望查詢你可以使用一個單一的左連接用情況,即隨着時間戳記從calendar表範圍每個月

SELECT COALESCE(t.transactions, 0) AS transactions, 
     COALESCE(t.additions, 0) AS additions, 
     COALESCE(t.redemptions, 0) AS redemptions, 
     calendar.date 
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date 
      FROM calendar 
      WHERE datefield LIKE '2017-%' 
      GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar 
LEFT JOIN 
(select 
     COUNT(transaction_type) as transactions 
     , sum(case when transaction_type = 'add' then 1 else 0 end) as additions 
     , sum(case when transaction_type = 'remove' then 1 else 0 end) as redemptions 
     , from_unixtime(date_add, '%b %Y') as date_t 
     FROM transactions 
     WHERE merchant_id = 108 
     GROUP BY from_unixtime(date_add, '%b %Y') t ON calendar.date = t.date_t 
+1

@Mihai ...非常感謝ON更正.. – scaisEdge

+0

謝謝@scaisEdge CASE WHEN是一個啓示 - muchos gracias!這從0.5秒降低到0.2 – contool

+1

第一個SUM可以簡化爲SUM(transaction_type ='add')'。 –

0

首先,我將創建一個派生表。如果編入date_add索引,則通過這種方式,與transactions表的聯接將會很有效。

select month(c.datefield) as month, 
     unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from, 
     unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to 
from calendar c 
where c.datefield between '2017-01-01' and '2017-12-31' 
group by month(c.datefield) 

transaactions表連接,並使用條件彙總,讓您的數據:

select c.month, 
     sum(t.amount) as transactions, 
     sum(case when t.transaction_type = 'add' then t.amount else 0 end) as additions, 
     sum(case when t.transaction_type = 'remove' then t.amount else 0 end) as redemptions 
from (
    select month(c.datefield) as m, date_format(c.datefield, '%b') as `month` 
      unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from, 
      unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to 
    from calendar c 
    where c.datefield between '2017-01-01' and '2017-12-31' 
    group by month(c.datefield), date_format(c.datefield, '%b') 
) c 
left join transactions t on t.date_add between c.ts_from and c.ts_to 
where t.merchant_id = 108 
group by c.m, c.month 
order by c.m 
+0

它很可能是如果你使'ORDER BY'匹配'GROUP BY',速度會更快。這個_may_可以避免額外的排序。 –

+1

@RickJames您認爲優化器無法「看到」ORDER BY是GROUP BY的子集嗎?然而,這裏並不重要,因爲結果集只包含12行。我在這裏有另一個問題。此查詢在1M行表上需要2秒(2014年爲〜330K行)。但改爲內連接,它在100毫秒內運行。所以我需要把代碼放到另一個子查詢中。 MySQL優化器有時真的很愚蠢。 –