我有一個大型數據庫,其中包含必須可搜索和可分頁的EAV結構化數據。我在書中嘗試了每一個技巧,以使其足夠快,但在某些情況下,它仍然無法在合理的時間內完成。需要MySQL優化以便在EAV結構化數據上進行復雜搜索
這是我的表結構(僅適用於相關的部分,問路程,如果你需要更多):
CREATE TABLE IF NOT EXISTS `object` (
`object_id` bigint(20) NOT NULL AUTO_INCREMENT,
`oid` varchar(32) CHARACTER SET utf8 NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`object_id`),
UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `version` (
`version_id` bigint(20) NOT NULL AUTO_INCREMENT,
`type_id` bigint(20) NOT NULL,
`object_id` bigint(20) NOT NULL,
`created` datetime NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`version_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `value` (
`value_id` bigint(20) NOT NULL AUTO_INCREMENT,
`object_id` int(11) NOT NULL,
`attribute_id` int(11) NOT NULL,
`version_id` bigint(20) NOT NULL,
`type_id` bigint(20) NOT NULL,
`value` text NOT NULL,
PRIMARY KEY (`value_id`),
KEY `field_id` (`attribute_id`),
KEY `action_id` (`version_id`),
KEY `form_id` (`type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
這是一個樣本對象。我在我的數據庫中有大約1百萬。每個對象可以有不同數量的屬性不同attribute_id
INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');
現就我目前的機制。我第一次嘗試是Mysql的典型方法。根據需要做一個龐大的SQL負載連接。完全的desaster!花了很長時間來加載甚至由於內存耗盡導致PHP和MySQL服務器崩潰。
所以我拆我的查詢分成幾個步驟:
1確定所有需要的attribute_ids。
我可以在另一個引用對象的type_id的表中查找它們。結果是一個attribute_ids列表。 (此表與表現並不相關,所以它不包含在我的示例中。)
:type_id包含我想要包含在我的搜索中的所有對象的所有type_id。我的應用程序中已經有了這些信息。所以這是便宜的。
SELECT * FROM attribute WHERE form_id IN (:type_id)
結果是一個type_id整數的數組。
2搜索匹配對象 編譯一個大的SQL查詢,爲每個我想要的條件添加一個INNER JOIN。這聽起來很可怕,但最終它是最快的方法:(
典型的生成查詢可能看起來像這樣:LIMIT很遺憾是必要的,或者我可能會得到太多的ID,導致數組使PHP爆炸或打破IN聲明在接下來的查詢:
SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1
ON `version`.version_id = condition1.version_id
AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2
ON `version`.version_id = condition2.version_id
AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3
ON `version`.version_id = condition3.version_id
AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000
結果將包含任何對象,我可能需要所有object_ids我選擇object_ids而不是version_ids,因爲我需要有匹配的對象的所有版本,無論是哪版本匹配
3排序和頁面結果 接下來,我將創建一個查詢,按特定屬性對對象進行排序,然後對結果數組進行頁面查詢。
SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset
結果是排序和翻頁對象ID列表從以前的搜索
4獲得我們的完整對象,版本和屬性 在最後一步,我會選擇任何對象的所有值並發現以前的查詢版本。
SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC
然後將結果通過PHP編譯爲好的對象 - >版本 - >值數組結構。
現在的問題:
- 可這整個混亂以任何方式來加速?
- 我可以以某種方式從我的搜索查詢中刪除LIMIT 10000限制嗎?
如果一切都失敗了,也許切換數據庫技術?見我的其他問題:Database optimized for searching in large number of objects with different attributes
現實生活樣本
表尺寸:對象 - 193801行版本 - 193841行,值 - 1053928行
SELECT * FROM attribute WHERE attribute_id IN (30)
SELECT DISTINCT `version`.object_id
FROM version
INNER JOIN value AS condition_d4e328e33813
ON version.version_id = condition_d4e328e33813.version_id
AND condition_d4e328e33813.type_id IN (30)
AND condition_d4e328e33813.attribute_id IN (377)
AND condition_d4e328e33813.value LIKE '%e%'
INNER JOIN value AS condition_2c870b0a429f
ON version.version_id = condition_2c870b0a429f.version_id
AND condition_2c870b0a429f.type_id IN (30)
AND condition_2c870b0a429f.attribute_id IN (376)
AND condition_2c870b0a429f.value LIKE '%s%'
WHERE version.type_id IN (30)
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!
說明:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE condition_2c870b0a429f ref field_id,action_id,form_id field_id 4 const 178639 Using where; Using temporary; Using filesort
1 SIMPLE action eq_ref PRIMARY PRIMARY 8 condition_2c870b0a429f.action_id 1 Using where
1 SIMPLE condition_d4e328e33813 ref field_id,action_id,form_id action_id 8 action.action_id 11 Using where; Distinct
objects search compl eted(峯值RAM:5.91MB,時間:4.64s)
SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0
對象排序完成(峯值RAM:6.68MB,時間:0.352s)
SELECT `value`.*, object.*, version.*, type.*,
object.status AS `object.status`,
object.flag AS `object.flag`,
version.created AS `version.created`,
version.status AS `version.status`,
version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC
對象負載查詢完成(峯值RAM: 6.68MB,時間:0.083s)
對象彙編成完成陣列(峯值RAM:6.68MB,時間:0.007s)
想必'value_id'已經沒有意義了 - 你可以很容易地使用(OBJECT_ID,attribute_id)作爲PK?並且owner_id和type對於一個給定的對象來說總是相同的(所以'value'表中的冗餘值? – Strawberry
value_id沒有意義,只是總是添加一個id列而已,它會減慢任何東西嗎? – ToBe
Just to be很明顯,使用當前的機制,速度是足夠公平的,問題是我有限制搜索結果10000返回的ID速度只是一個問題,如果我建立一個全在一個查詢,但由於MySQL不能做EAV數據立方體,反正我不能這樣做,至少對我來說是知道的 – ToBe