2009-10-12 71 views
1

我有兩個表(產品和供應商),並且想要查明哪些項目不再列在供應商表中。高效的MySQL查詢來查找A中不匹配的條目B

表uc_products有產品。表uc_supplier_csv有供應商庫存。 uc_products.model加入uc_suppliers.sku。

當試圖識別供應商表中未涉及的產品表中的庫存時,我看到很長的查詢。我只想提取匹配項的nid; sid IS NULL就是這樣,我可以識別哪些項目沒有供應商。

對於下面的第一個查詢,每小時需要數據庫服務器(4GB ram/2x 2.4GHz intel)才能得到結果(507行)。我沒有等待第二個查詢完成。

如何使此查詢更優化?是否由於不匹配的字符集?

我在想,下面將是最有效的SQL使用:

  SELECT nid, sid 
      FROM uc_products p 
LEFT OUTER JOIN uc_supplier_csv c 
      ON p.model = c.sku 
     WHERE sid IS NULL ; 

對於此查詢,我得到以下EXPLAIN結果:

mysql> EXPLAIN SELECT nid, sid FROM uc_products p LEFT OUTER JOIN uc_supplier_csv c ON p.model = c.sku WHERE sid IS NULL; 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra     | 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+ 
| 1 | SIMPLE  | p  | ALL | NULL   | NULL | NULL | NULL | 6526 |       | 
| 1 | SIMPLE  | c  | ALL | NULL   | NULL | NULL | NULL | 126639 | Using where; Not exists | 
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------------------+ 
2 rows in set (0.00 sec) 

我會認爲密鑰idx_sku和idx_model在這裏可以使用,但它們不是。是因爲表的默認字符集不匹配?一個是UTF-8,另一個是latin1。

我也被認爲是這種形式:

SELECT nid 
    FROM uc_products 
WHERE model 
NOT IN ( 
     SELECT DISTINCT sku FROM uc_supplier_csv 
     ) ; 

EXPLAIN顯示了該查詢的結果如下:

mysql> explain select nid from uc_products where model not in (select sku from uc_supplier_csv) ; 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| id | select_type  | table   | type | possible_keys   | key  | key_len | ref | rows | Extra     | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| 1 | PRIMARY   | uc_products  | ALL | NULL     | NULL | NULL | NULL | 6520 | Using where    | 
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258  | NULL | 126639 | Using where; Using index | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
2 rows in set (0.00 sec) 

而且,這樣我就不會錯過任何出,這裏更多的是一些令人興奮詳細信息:表尺寸和統計,表結構:)

mysql> show table status where Name in ('uc_supplier_csv', 'uc_products') ; 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+ 
| Name   | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time   | Update_time   | Check_time   | Collation   | Checksum | Create_options | Comment | 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+ 
| uc_products  | MyISAM |  10 | Dynamic | 6520 |    89 |  585796 | 281474976710655 |  232448 |  912 |   NULL | 2009-04-24 11:03:15 | 2009-10-12 14:23:43 | 2009-04-24 11:03:16 | utf8_general_ci |  NULL |    |   | 
| uc_supplier_csv | MyISAM |  10 | Dynamic | 126639 |    26 |  3399704 | 281474976710655 |  5864448 |   0 |   NULL | 2009-10-12 14:28:25 | 2009-10-12 14:28:25 | 2009-10-12 14:28:27 | latin1_swedish_ci |  NULL |    |   | 
+-----------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------+ 

CREATE TABLE `uc_products` (
    `vid` mediumint(9) NOT NULL default '0', 
    `nid` mediumint(9) NOT NULL default '0', 
    `model` varchar(255) NOT NULL default '', 
    `list_price` decimal(10,2) NOT NULL default '0.00', 
    `cost` decimal(10,2) NOT NULL default '0.00', 
    `sell_price` decimal(10,2) NOT NULL default '0.00', 
    `weight` float NOT NULL default '0', 
    `weight_units` varchar(255) NOT NULL default 'lb', 
    `length` float unsigned NOT NULL default '0', 
    `width` float unsigned NOT NULL default '0', 
    `height` float unsigned NOT NULL default '0', 
    `length_units` varchar(255) NOT NULL default 'in', 
    `pkg_qty` smallint(5) unsigned NOT NULL default '1', 
    `default_qty` smallint(5) unsigned NOT NULL default '1', 
    `unique_hash` varchar(32) NOT NULL, 
    `ordering` tinyint(2) NOT NULL default '0', 
    `shippable` tinyint(2) NOT NULL default '1', 
    PRIMARY KEY (`vid`), 
    KEY `idx_model` (`model`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8 

CREATE TABLE `uc_supplier_csv` (
    `sid` int(10) unsigned NOT NULL default '0', 
    `sku` varchar(255) default NULL, 
    `stock` int(10) unsigned NOT NULL default '0', 
    `list_price` decimal(8,2) default '0.00', 
    KEY `idx_sku` (`sku`), 
    KEY `idx_stock` (`stock`), 
    KEY `idx_sku_stock` (`sku`,`stock`), 
    KEY `idx_sid` (`sid`) 
) ENGINE=MyISAM DEFAULT CHARSET=latin1 

編輯:從馬丁下面幾個建議的查詢添加查詢計劃:

mysql> explain SELECT nid FROM uc_products p WHERE NOT EXISTS (SELECT 1 FROM uc_supplier_csv c WHERE p.model = c.sku) ; 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 
| id | select_type  | table | type | possible_keys | key  | key_len | ref | rows | Extra     | 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 
| 1 | PRIMARY   | p  | ALL | NULL   | NULL | NULL | NULL | 6526 | Using where    | 
| 2 | DEPENDENT SUBQUERY | c  | index | NULL   | idx_sku | 258  | NULL | 126639 | Using where; Using index | 
+----+--------------------+-------+-------+---------------+---------+---------+------+--------+--------------------------+ 
2 rows in set (0.00 sec) 

mysql> explain SELECT nid FROM uc_products WHERE model NOT IN (SELECT sku FROM uc_supplier_csv) ; 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| id | select_type  | table   | type | possible_keys   | key  | key_len | ref | rows | Extra     | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
| 1 | PRIMARY   | uc_products  | ALL | NULL     | NULL | NULL | NULL | 6526 | Using where    | 
| 2 | DEPENDENT SUBQUERY | uc_supplier_csv | index | idx_sku,idx_sku_stock | idx_sku | 258  | NULL | 126639 | Using where; Using index | 
+----+--------------------+-----------------+-------+-----------------------+---------+---------+------+--------+--------------------------+ 
2 rows in set (0.00 sec) 
+2

您使用在第一個查詢爲是不正確 - 因爲沒有GROUP BY,它應該是一個簡單的哪裏。不知道爲什麼MySQL不給你一個錯誤消息,但我想這就是搞砸了查詢計劃! – 2009-10-12 04:28:58

+0

謝謝亞歷克斯 - 更新 – 2009-10-12 09:01:13

+0

我昨天在我的筆記本電腦上測試了這個頁面上的四個查詢表單(MBP2.4GHz/4GB/OSX/MAMP MySQL)。 *上面的LEFT OUTER JOIN表單需要3526s才能執行。 *上面的子查詢表格執行了1021s。 *馬丁的建議下面花了637s執行。 *詹姆斯的速度比馬丁的速度略快,但是與其他三種形式的結果不同。 – 2009-10-12 20:00:49

回答

3

也許嘗試使用NOT EXISTS而不是計數?例如:

SELECT nid 
    FROM uc_products p 
WHERE NOT EXISTS ( 
     SELECT 1 
     FROM uc_supplier_csv c 
     WHERE p.model = c.sku 
     ) 

SO用戶Quassnoi有short article概述了一些測試,認爲這也可能是值得一試:

SELECT nid 
    FROM uc_products 
WHERE model NOT IN ( 
     SELECT sku 
     FROM uc_supplier_csv 
     ) 

基本上按你原來的查詢,沒有區別。

另一個用於您克里斯,這個時間與編碼交叉的幫助下加入:

SELECT nid 
    FROM uc_products p 
WHERE NOT EXISTS (
     SELECT 1 
     FROM uc_supplier_csv c 
     WHERE CONVERT(p.model USING latin1) = c.sku 
     ) 
+0

此查詢是返回正確結果的最快建議解決方案。執行了637秒。 – 2009-10-12 19:56:48

+0

查詢計劃是什麼樣的? – 2009-10-12 21:03:22

+0

加入問題(格式不適用於我的評論?) – 2009-10-13 04:38:38