下面的查詢很好,但速度很慢。在約7500行的表中,大約需要30秒才能執行。我怎麼能加快速度?查找共有5個字段中的3個的行 - 如何加快查詢速度?
目標是在同一張表中找到「幾乎重複」的行。當匹配的5個領域中有3個我們有一個命中。
SELECT
originalTable.id,
originalTable.lastname,
originalTable.firstname,
originalTable.address,
originalTable.city,
originalTable.email
FROM
address as originalTable,
address as compareTable
WHERE
# do not find the same record
originalTable.id != compareTable.id and
# at least 3 out of those 5 should match
(originalTable.firstname = compareTable.firstname) +
(originalTable.lastname = compareTable.lastname) +
(originalTable.address = compareTable.address and originalTable.address != '') +
(originalTable.city = compareTable.city and originalTable.city != '') +
(originalTable.email = compareTable.email and originalTable.email != '')
>= 3
GROUP BY
originalTable.id
ORDER BY
originalTable.lastname asc,
originalTable.firstname asc,
originalTable.city asc
感謝您的任何優化提示。
這樣做是有意義的'originalTable.id!= compareTable.id'? – ajreal
是的。如果沒有這個,我會找到每一條記錄,因爲它比較了200條記錄和200條記錄,並且......哦,不知道......它們是重複的! :) – sprain
親愛的,你可以嘗試做一個錯誤的條件笛卡爾產品,這意味着行大小是'7500 x 7499 = 56242500' ... – ajreal