2010-09-19 97 views
1

我有一個超慢的查詢,我在這裏發佈:http://pastebin.com/E5sdRi7e。當我做了一個解釋,我得到以下內容:超慢MySQL - 需要幫助!

id select_type   table  type possible_keys key   key_len ref         rows Extra 
1 PRIMARY    <derived2> ALL  NULL   NULL   NULL  NULL        5  Using filesort 
2 DERIVED    Workflow ALL  PRIMARY  NULL   NULL  NULL        9  Using temporary; Using filesort 
2 DERIVED    <derived3> ALL  NULL   NULL   NULL  NULL        141 Using where; Using join buffer 
2 DERIVED    DataSource ALL  PRIMARY  NULL   NULL  NULL        1310 Using where; Using join buffer 
2 DERIVED    <derived4> ALL  NULL   NULL   NULL  NULL        1310 Using where; Using join buffer 
2 DERIVED    User  eq_ref PRIMARY  PRIMARY  4  LatestDataSourceActivityLog.UserId 1 
4 DERIVED    t1   ALL  NULL   NULL   NULL  NULL        5400 Using where; Using temporary; Using filesort 
5 DEPENDENT SUBQUERY t2   ref  DataSourceId DataSourceId 4  companyname_db.t1.DataSourceId  4 
3 DERIVED    DataSource range PRIMARY  PRIMARY  4  NULL        142 Using where 

上表是什麼告訴我的?它能幫助我確定哪些字段應該被編入索引嗎?

任何幫助,非常感謝。

查詢

SELECT WrappedData.* 
FROM (SELECT ParentLeafNodeDataSource.Id, 
       LatestDataSourceActivityLog.UserId, 
       DataSource.Status AS StatusCode, 
       (CASE 
        WHEN User.Name IS NULL THEN 'CompanyName' 
        ELSE User.Name 
       END)   AS `Username`, 
       Workflow.Name  AS WorkflowName, 
       LatestDataSourceActivityLog.Timestamp 
     FROM DataSource, 
       Workflow, 
       (SELECT * 
       FROM DataSource 
       WHERE DataSource.Id IN (0, 1, 2, 3, 
              4, 5, 6, 7, 
              8, 9, 10, 11, 
              12, 13, 16, 21, 
              22, 23, 24, 25, 
              26, 27, 28, 29, 
              30, 31, 32, 33, 
              34, 35, 36, 37, 
              38, 39, 40, 41, 
              42, 43, 44, 45, 
              46, 47, 48, 49, 
              50, 51, 52, 53, 
              54, 55, 56, 57, 
              58, 59, 60, 61, 
              62, 63, 64, 65, 
              66, 67, 68, 69, 
              70, 71, 72, 73, 
              74, 75, 76, 77, 
              78, 79, 80, 81, 
              83, 84, 85, 86, 
              87, 88, 89, 90, 
              91, 92, 93, 94, 
              95, 96, 97, 98, 
              99, 100, 101, 102, 
              103, 104, 105, 106, 
              107, 108, 109, 110, 
              111, 112, 113, 114, 
              115, 116, 117, 118, 
              119, 120, 142, 1293, 
              1294, 1295, 1296, 1297, 
              1298, 1299, 143, 1300, 
              1301, 1302, 1303, 1304, 
              1305, 1306, 144, 146, 
              145, 1307, 1308, 1309, 
              1310, 147, 149, 148, 
              150, 151)) AS ParentLeafNodeDataSource, 
       (SELECT t1.* 
       FROM DataSourceActivityLog AS t1 
       WHERE Timestamp = (SELECT Max(t2.Timestamp) 
            FROM DataSourceActivityLog AS t2 
            WHERE t1.DataSourceId = t2.DataSourceId) 
       GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog 
       LEFT JOIN User 
       ON User.Id = LatestDataSourceActivityLog.UserId 
     WHERE ParentLeafNodeDataSource.Status = '203' 
       OR ParentLeafNodeDataSource.Status = '204' 
        AND Workflow.Id = ParentLeafNodeDataSource.WorkflowId 
        AND LatestDataSourceActivityLog.DataSourceId = ParentLeafNodeDataSource.Id 
        AND DataSource.Id = LatestDataSourceActivityLog.DataSourceId 
        AND LatestDataSourceActivityLog.UserId = 1 
     GROUP BY ParentLeafNodeDataSource.Id) AS WrappedData 
ORDER BY WrappedData.`Timestamp` DESC 
+0

你可以粘貼到問題的查詢?趨勢科技阻止pastebin。 – 2010-09-19 12:08:57

+3

對不起,但這個查詢只是讓我的一天:) CodeSOD(thedailywtf.com) – 2010-09-19 12:09:37

+2

@Martin:聽起來像一個軟件遲鈍的一塊。最好擺脫它。 – 2010-09-19 12:11:19

回答

2

這是很難說確鑿的,但這裏有一些重構的東西。

在性能上,首先要看的是GROUP功能。

  (SELECT t1.* 
      FROM DataSourceActivityLog AS t1 
      WHERE Timestamp = (SELECT Max(t2.Timestamp) 
           FROM DataSourceActivityLog AS t2 
           WHERE t1.DataSourceId = t2.DataSourceId) 
      GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog 

可消除使用MAX完全

  (SELECT t1.* 
      FROM DataSourceActivityLog AS t1 
      WHERE Timestamp = (SELECT t2.Timestamp 
           FROM DataSourceActivityLog AS t2 
           WHERE t1.DataSourceId = t2.DataSourceId 
           ORDER BY t2.Timestamp DESC 
           LIMIT 1) 
      GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog 

也許不是一個大的性能問題,但在這裏你可以使用IFNULL或COALESCE代替CASE。

(CASE 
    WHEN User.Name IS NULL THEN 'CompanyName' 
    ELSE User.Name 
END) 

相反

(IFNULL(User.Name,'CompanyName') 

在指標方面,他們增加使查找更容易選擇性能,但他們慢下來的寫操作的索引必須被更新。如果您的應用程序不是重寫字符,則應該爲常用搜索列建立索引,特別是在大型表格中。

在這個查詢中,您看起來像是通過向DataSourceId添加索引獲益,但我無法測試是否有任何收益。主鍵已經被編入索引。

0

你考慮過MySql Query Profiler

這就是你將如何理解你的性能問題。

沒有這一步,這裏的大多數人會傷心地喜歡在你的查詢中寫笑話,而不是試圖幫助你。

+0

是不好的,這是一個開玩笑的問題?如果是這樣,你能指出我的錯誤是什麼? – StackOverflowNewbie 2010-09-19 12:31:52

+0

c'mon Pierre。爲什麼那麼認真?! – 2010-09-19 12:36:14

+0

我跑分析器。嫌疑人輸入是這樣的:「複製到tmp表\t 97.271238」。現在,我怎麼知道我的SQL的哪個部分導致這個複製到臨時表? – StackOverflowNewbie 2010-09-19 13:25:55

1

我會嘗試以下方法:

  • 外包裝紙是完全沒用的,把ORDER BY在內部查詢應該工作一樣
  • 嘗試重寫子查詢被用作連接的
  • 然後將WHERE子句移動到相關JOINS的中間結果集變得更小
  • 查看WHERE和JOIN的哪些索引應該生成。

快速嘗試(我不知道結果會是一樣的)

SELECT 
    dsa.Status AS StatusCode, 
    dsb.Id, 
    dsl.UserId, 
    dsl.Timestamp 
    wf.Name AS WorkflowName, 
    COALESCE(u.Name, 'CompanyName') AS `Username` 
FROM 
    DataSource dsa 
    INNER JOIN DataSource dsb 
     ON dsb.Id IN (0, 1, 2, 3, 4, 5, 6, 7, etc)) 
     AND dsb.Status = '203' OR dsb.Status = '204' 
    INNER JOIN DataSourceActivityLog dsl 
     ON dsl.DataSourceId=dsa.Id 
     AND dsl.DataSourceId=dsb.Id 
     AND dsl.UserId = 1 
     AND dsl.Timestamp=(
      SELECT MAX(t2.Timestamp) 
      FROM DataSourceActivityLog AS dslt 
      WHERE dslt.DataSourceId = dsl.DataSourceId 
     ) 
    INNER JOIN Workflow wf 
        ON wf.Id = dsb.WorkflowId 
    LEFT JOIN User u 
     ON u.Id = dsl.UserId 
GROUP BY 
    dsl.Id 
ORDER BY 
    dsl.Timestamp DESC 

也許使用Zurahn的重構在子查詢擺脫GROUP BY的

隨着

  • DataSource.WorkFlowId,DataSource.Status
  • DataSourceActivityLog.Time:在指標郵票,DataSourceActivityLog.UserId,DataSourceActivityLog.DataSourceId

好吧其實,我來到這DSB(原ParentLeafNodeDataSource)實際上是數據源的結論,這可以填補WHERE子句。就我個人而言,我嘗試從數據源開始,然後加入剩下的數據。這通常會導致查詢容易理解實際選擇的內容。而不是最後的JOIN突然減少結果集。所以重新排序JOIN可以做到這一點,它會像這樣:

SELECT 
    dsa.Status AS StatusCode, 
    dsb.Id, 
    dsl.UserId, 
    dsl.Timestamp 
    wf.Name AS WorkflowName, 
    COALESCE(u.Name, 'CompanyName') AS `Username` 
FROM 
    DataSource dsb 
    INNER JOIN Workflow wf 
     ON dsb.WorkflowId=wf.Id 
    INNER JOIN DataSourceActivityLog dsl 
     ON dsl.DataSourceId=dsb.Id 
     AND dsl.UserId=1 
     AND dsl.Timestamp=(
      SELECT MAX(t2.Timestamp) 
      FROM DataSourceActivityLog AS dslt 
      WHERE dslt.DataSourceId = dsl.DataSourceId 
     ) 
    INNER JOIN DataSource dsa 
     ON dsl.DataSourceId=dsa.Id 
    LEFT JOIN User u 
     ON dsl.UserId=u.Id 
WHERE 
    dsb.Id IN (0, 1, 2, 3, 4, 5, 6, 7, etc)) 
    AND dsb.Status = '203' OR dsb.Status = '204' 
GROUP BY 
    dsl.Id 
ORDER BY 
    dsl.Timestamp DESC 
+0

感謝您的見解。肯定有東西要看。 – StackOverflowNewbie 2010-09-19 12:55:04

+0

試圖清理你的SQL,並以此結束:http://pastebin.com/v57289YZ。 phpMyAdmin抱怨:#1054 - 'on子句'中的未知列'dsa.Id'。任何想法有什麼不對?似乎是說沒有dsa.Id,但我確定它存在於數據庫中。 – StackOverflowNewbie 2010-09-19 13:42:17

+0

這可能是因爲我的初始版本中的JOIN順序,其中別名dsa尚未存在於要加入的位置。我做了一些編輯,請嘗試我的最新版本 – ontrack 2010-09-19 13:53:00