當比較dm()
輸出I使用下面的函數,以允許模糊性的另一水平。直接檢查dm('smith') != dm('schmitt')
失敗了大量的名字,包括我自己的常見拼寫錯誤。
該函數創建一個0.0到1.0之間的匹配權重(我希望),它允許我對每個返回的行進行排名,並選擇好處,0.3對於捕捉奇怪的發音是一個相當不錯的值,0.5是比較平常的。
即 dmcompare(dm("boothroyd"), dm("boofreed")) = 0.3
dmcompare(dm("smith"), dm("scmitt")) = 0.5
請注意,這是雙音位字符串和不原串,這是性能問題的比較,我的數據庫包含了音位列以及原始字符串。
CREATE FUNCTION `dmcompare`(leftValue VARCHAR(55), rightValue VARCHAR(55))
RETURNS DECIMAL(2,1)
NO SQL
BEGIN
---------------------------------------------------------------------------------------
-- Compare two (double) metaphone strings for potential similarlity, i.e.
-- dm("smith") != dm("schmitt") :: "SM0;XMT" != "XMT;SMT"
-- dmcompare(dm('smith'), dm('schmitt') returns 0,5
-- @author: P.Boothroyd
-- @version: 0.9, 08/01/2013
-- The values here can still be played with
-- (c) GNU P L - feel free to share and adapt, but please acknowledge the original code
---------------------------------------------------------------------------------------
DECLARE leftPri, leftSec, rightPri, rightSec VARCHAR(55) DEFAULT '';
DECLARE sepPos INT;
DECLARE retValue DECIMAL(2,1);
DECLARE partMatch BOOLEAN;
-- Extract the metaphone tags
SET sepPos = LOCATE(";", leftValue);
IF sepPos = 0 THEN
SET sepPos = LENGTH(leftValue) + 1;
END IF;
SET leftPri = LEFT(leftValue, sepPos - 1);
SET leftSec = MID(leftValue, sepPos + 1, LENGTH(leftValue) - sepPos);
SET sepPos = LOCATE(";", rightValue);
IF sepPos = 0 THEN
SET sepPos = LENGTH(rightValue) + 1;
END IF;
SET rightPri = LEFT(rightValue, sepPos - 1);
SET rightSec = MID(rightValue, sepPos + 1, LENGTH(rightValue) - sepPos);
-- Calculate likeness factor
SET retValue = 0;
SET partMatch = FALSE;
-- Primaries equal 50% match
IF leftPri = rightPri THEN
SET retValue = retValue + 0.5;
SET partMatch = TRUE;
ELSE
IF SOUNDEX(leftPri) = SOUNDEX(rightPri) THEN
SET retValue = retValue + 0.3;
SET partMatch = TRUE;
END IF;
END IF;
-- Test alternate primary and secondaries, worth 30% match
IF leftSec = rightPri THEN
SET retValue = retValue + 0.3;
SET partMatch = TRUE;
IF SOUNDEX(leftSec) = SOUNDEX(rightPri) THEN
SET retValue = retValue + 0.2;
SET partMatch = TRUE;
END IF;
END IF;
-- Test alternate primary and secondaries, worth 30% match
IF leftPri = rightSec THEN
SET retValue = retValue + 0.3;
SET partMatch = TRUE;
IF SOUNDEX(leftPri) = SOUNDEX(rightSec) THEN
SET retValue = retValue + 0.2;
SET partMatch = TRUE;
END IF;
END IF;
-- Are secondary values the same or both NULL
IF leftSec = rightSec THEN
-- No secondaries ...
IF leftSec = '' THEN
-- If there is prior matching then no secondaries is 40%
IF partMatch = TRUE THEN
SET retValue = retValue + 0.4;
END IF;
ELSE
-- If the secondaries match then 50% match
SET retValue = retValue + 0.5;
END IF;
ELSE
IF SOUNDEX(leftSec) = SOUNDEX(rightSec) THEN
IF leftSec = '' THEN
IF partMatch = TRUE THEN
SET retValue = retValue + 0.3;
END IF;
END IF;
END IF;
END IF;
RETURN (retValue);
END
請隨時個代碼中使用,也請註明來源爲這個代碼P.Boothroyd任何用途 - 即改變價值觀念等
乾杯,保羅
鏈接被破壞。 – 2015-11-16 20:39:06
MySQL(和Python)代碼現在位於GitHub上:https://github.com/AtomBoy/double-metaphone – Andrew 2016-05-23 20:30:30