查詢仍然很慢

我有一張有200萬條記錄的表格。查詢仍然很慢

下面是表

comments 
--------- 
    +-------------+---------------+------+-----+---------+----------------+ 
    | Field  | Type   | Null | Key | Default | Extra   | 
    +-------------+---------------+------+-----+---------+----------------+ 
    | commentid | int(11)  | NO | PRI | NULL | auto_increment | 
    | parentid | int(11)  | YES |  | 0  |    | 
    | refno  | int(11)  | YES |  | 0  |    | 
    | createdate | int(11)  | YES | MUL | 0  |    | 
    | remoteip | varchar(80) | YES |  |   |    | 
    | fingerprint | varchar(50) | YES |  |   |    | 
    | locid  | int(11)  | YES | MUL | 0  |    | 
    | clubid  | int(11)  | YES |  | 0  |    | 
    | profileid | int(11)  | YES | MUL | 0  |    | 
    | userid  | int(11)  | YES | MUL | 0  |    | 
    | global  | int(11)  | YES |  | 0  |    | 
    | official | int(11)  | YES |  | 0  |    | 
    | legacyuser | int(11)  | YES | MUL | 0  |    | 
    | mediaid  | int(11)  | YES |  | 0  |    | 
    | status  | int(11)  | YES |  | 1  |    | 
    | comment  | varchar(4000) | YES |  |   |    | 
    | likes  | int(11)  | YES |  | 0  |    | 
    | dislikes | int(11)  | YES |  | 0  |    | 
    | import  | int(11)  | YES |  | 0  |    | 
    | author  | varchar(50) | YES |  |   |    | 
    +-------------+---------------+------+-----+---------+----------------+

現在對兩億條記錄此查詢需要6〜7秒：

select * from comments where (locid=2085 or global=1) and status>0 order by createdate desc limit 20;

我決定添加一個索引來LOCID，它仍然在產生的結果6至7秒

我本來可以使用sqlfiddle，但它會不必要的，因爲這個問題的基礎p可以確保性能，我不會將2mil記錄添加到sqlfiddle。

是否有任何策略或實現可以使這個查詢進入3秒範圍？

謝謝！

UPDATE

這是我的解釋顯示錶。

 | comments | CREATE TABLE `comments` (
     `commentid` int(11) NOT NULL AUTO_INCREMENT, 
     `parentid` int(11) DEFAULT '0', 
     `refno` int(11) DEFAULT '0', 
     `createdate` int(11) DEFAULT '0', 
     `remoteip` varchar(80) DEFAULT '', 
     `fingerprint` varchar(50) DEFAULT '', 
     `locid` int(11) DEFAULT '0', 
     `clubid` int(11) DEFAULT '0', 
     `profileid` int(11) DEFAULT '0', 
     `userid` int(11) DEFAULT '0', 
     `global` int(11) DEFAULT '0', 
     `official` int(11) DEFAULT '0', 
     `legacyuser` int(11) DEFAULT '0', 
     `mediaid` int(11) DEFAULT '0', 
     `status` int(11) DEFAULT '1', 
     `comment` varchar(4000) DEFAULT '', 
     `likes` int(11) DEFAULT '0', 
     `dislikes` int(11) DEFAULT '0', 
     `import` int(11) DEFAULT '0', 
     `author` varchar(50) DEFAULT '', 
     PRIMARY KEY (`commentid`), 
     KEY `comments_locid` (`locid`), 
     KEY `comments_userid` (`userid`), 
     KEY `idx_legacyusers` (`legacyuser`), 
     KEY `profile_index` (`profileid`), 
     KEY `comments_createdate` (`createdate`), 
     KEY `compound_for_comments` (`locid`,`global`,`status`), 
     KEY `global` (`global`), 
     KEY `status` (`status`) 
    ) ENGINE=InnoDB AUTO_INCREMENT=3848451 DEFAULT CHARSET=latin1

來源

2016-03-28 slicks1

有很好的優化建議（和診斷工具）在那裏值得檢查，請參閱。 https://dev.mysql.com/doc/refman/5.7/en/using-explain.html –

很明顯，在'WHERE'子句中使用的所有列都必須以某種方式進行索引。看看mysql＃s'EXPLAIN'特性。它有助於理解正在發生的事情。 – arkascha

K ill look into explain – slicks1

大多數數據庫，尤其是MySQL，與or差異很大。

您可以通過每一半處理or的一側，這樣將查詢到union根除or：

select * from (
    select * from comments 
    where locid = 2085 
    and status > 0 
    union 
    select * from comments 
    where global = 1 
    and status > 0) x 
order by createdate desc 
limit 20

來源

2016-03-28 16:13:26 Bohemian

我相信'order by'會導致它花費這麼多時間。刪除命令，看看它是否改變。您可以通過主鍵進行排序，因爲後面的記錄具有更大的分配主鍵，這是鍵和更快的鍵。其他選項是使用存儲在內存而不是硬盤上的引擎。

來源

2016-03-28 15:28:24 BlackBrain

試試這個：

select distinct * from (

    select * from (
     select * from comments where locid=2085 and status>0 order by commentid desc limit 20 
    ) t1 

    union all 

     select * from (
     select * from comments where global=1 and status>0 order by commentid desc limit 20 
    ) t2 

) t 
order by commentid desc 
limit 20

與（locid，status）和（global，status）上的索引。（狀態，全局）可能比（全局，狀態）更好 - 這取決於哪一列更具選擇性。

只有在createdate排序等於commentid時纔有效。不然的話，你需要像（地點，狀態，創建）索引和createdate。

來源

2016-03-28 16:13:57

這看起來不錯。將來，您可以用聯合而不是聯合來代替獨立聯合 – slicks1

始終將'='部分放在第一位。也就是說，'（global，status）'比'（status，global）'更有用。 –

@ slicks1，'UNION'必須對結果進行排序以刪除重複項。 'DISTINCT'也必須這樣做。在我的情況下，我已經明確地在外部選擇中排序結果，所以我最好在那裏使用'DISTICT'。但是因爲我的子選擇限於40行，所以這並不重要。 –

這很可能不是在兩個答案至今提到的查詢，甚至更快：

SELECT c.* 
    FROM ( 
       (SELECT commentid, createdate 
        FROM comments 
        WHERE locid=2085 
         AND status > 0 
        ORDER BY createdate DESC 
        LIMIT 20 
      ) 
      UNION DISTINCT 
       (SELECT commentid, createdate 
        FROM comments 
        WHERE global=1 
         AND status > 0 
        ORDER BY createdate DESC 
        LIMIT 20 
      ) 
      ORDER BY createdate DESC 
      LIMIT 20 
     ) x 
    JOIN comments c USING (commentid);

與這兩個「覆蓋」指標：

INDEX(locid, status, createdate, commentid) 
INDEX(global, status, createdate, commentid)

（基於稍後信息）由於（全局= 1）通常是正確的，並且（狀態> 0）通常是錯誤的，因此以下可能會更好。（有一個問題，DESC是否增加了一隻猴子扳手。）

INDEX(locid, createdate, status, commentid) 
INDEX(global, createdate, status, commentid)

global仍存在風險。如果「通常」爲1，那麼上述指標可能不是最佳的。

由於子查詢將完全位於索引（「覆蓋」）中，而不是拖拽所有列（*），所以此表述將更快。這確實需要額外的SELECT，但是對於只有20行的PRIMARY KEY，它是有效的JOIN。如果你的桌子變得太大而無法緩存，這將是一個很大的表現獎金。

我明確地以UNION DISTINCT爲基礎，假設你會得到其他的嘟嘟聲。如果不是，那麼UNION ALL會更快。

架構批評：

使用適當大小的INTs - INT是4個字節; TINYINT只有1;等等。
在適當的地方使用UNSIGNED（尤其是對於ID和計數）。
適當時使用NOT NULL。
請勿自行索引標誌（global？status？）;該指數不太可能被使用。
status有多少個不同的值？如果status>0可以被status=1取代，我建議的索引會更好。

使數據變小可能會加快此查詢（及其他查詢）。

來源

2016-03-28 18:36:34

我同意，如果子選擇不限於20行。但是在LIMIT 20中，它並不真正（可測量）。更重要的是，索引可以支持所有條件和選擇順序。順便說一句，如果所需的列不在第一位，MySQL就不能使用'ORDER BY .. DESC'的索引。因此，子查詢中的「LIMIT」（在我的答案中）可能幾乎是無用的，波希米亞人的答案可以在較低的複雜度下執行相等的操作。 –

@PaulSpiegel - 「不能使用索引.. DESC」 - 我認爲它比這更復雜。有些情況下索引是有效的。 –

'FLUSH STATUS;選擇 ...; SHOW SESSION STATUS LIKE'Handler％';'是查看是否只掃描了超過20行的整個集合的方便方法。 –

查詢仍然很慢

回答

相關問題