MySQL的SOUNDEX()
函數非常接近。詳細瞭解它here。
例子:
create table test(id int auto_increment, a varchar(255), primary key(id));
insert into test(a) values
('accountancy'),
('accountant'),
('accountants'),
('accounting'),
('accountingc'),
('becounting'),
('asdf'),
('this is a test');
select
test.*,
SOUNDEX(a),
SOUNDEX('accountancy')
FROM
test
WHERE a SOUNDS LIKE 'accountancy';
如果不解決這個問題,Levenshtein算法是要走的路。與您的數據庫管理員交談,他允許您創建功能。如果他這樣做,這裏的解決方案(我沒有寫的功能,歸功於匿名):
DELIMITER //
CREATE FUNCTION levenshtein(s1 VARCHAR(255), s2 VARCHAR(255))
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END//
再次測試數據:
create table leven(id int auto_increment, a varchar(255), primary key(id));
insert into leven(a) values
('accountancy'),
('accountant'),
('accountants'),
('accounting'),
('accountingc'),
('becounting'),
('asdf'),
('this is a test')
;
select
leven.*,
levenshtein(leven.a, 'accountancy')
from
leven
where levenshtein(leven.a, 'accountancy') <= 3 /*or any value you like*/
謝謝。我很樂意使用Sphinx,之前只是簡單地介紹過它,但是仍然停留在共享主機上,所以這次沒有辦法。我沒有看到數據大小是這個網站的問題。希望如果它能起飛,我將能夠在未來改善主機。 –
您可以安裝PECL軟件包:http://pecl.php.net/package/stem或者您可以檢查它是否已安裝?沒關係,你也需要雪球。 – Nin
添加了一個php詞幹腳本,你應該可以在共享環境中使用它。請注意,這可能不完美。 – Nin