我想計算和計算SQL Server中字符串的所有trigrams。如何在SQL Server中計算字符串的所有trigrams
例如,如果字符串是hello
我想下面的輸出:
Trigram Count ------- ----- hel 1 ell 1 llo 1 lo- 1
我想計算和計算SQL Server中字符串的所有trigrams。如何在SQL Server中計算字符串的所有trigrams
例如,如果字符串是hello
我想下面的輸出:
Trigram Count ------- ----- hel 1 ell 1 llo 1 lo- 1
我仍然不知道是什麼的是N-gram但基於Ed的回答是這樣的你需要?
declare @string varchar(max) = 'hello'
declare @n int = 3
set @string = @string + REPLICATE('-',@n - (len(@string) % @n))
;with n as
(
SELECT 1 AS i
UNION ALL
SELECT i+1
FROM n
WHERE i <= (LEN(@string)[email protected])
)
select SUBSTRING(@string, i, @n), COUNT(*)
from n
group by SUBSTRING(@string, i, @n)
option (maxrecursion 0)
根據馬丁·史密斯的答案 - 由埃德和馬丁3
declare @string varchar(max) = 'hello'
SET @string = (SELECT CASE LEN(@string) % 3
WHEN 1 THEN @string + '--'
WHEN 2 THEN @string + '-'
ELSE @string
END)
;with n as
(
SELECT 1 AS i
UNION ALL
SELECT i+1
FROM n
WHERE i < (LEN(@string)-2)
)
select SUBSTRING(@string, i, 3) AS Trigram, COUNT(*) AS Count
from n
group by SUBSTRING(@string, i, 3)
option (maxrecursion 0)
借款增加邏輯墊串出與-
到數整除的字符,我認爲這是一個正確的實現:
declare @string varchar(max) = 'here kitty kitty'
SET @string = replace(@string, ' ', '-') --Wikipedia says this should be underscore, not dash
;with n as
(
SELECT 1 AS i
UNION ALL
SELECT i + 1
FROM n
WHERE i < (LEN(@string)-2)
)
select SUBSTRING(@string, i, 3) AS Trigram, COUNT(*) AS Count
from n
group by SUBSTRING(@string, i, 3)
option (maxrecursion 0)
我不知道這是否正確,但+1。你讀過維基百科的文章了嗎?編輯:我看到你做到了! – 2010-09-30 16:30:06
我閱讀了簡潔的三字母頁面(http://en.wikipedia.org/wiki/Trigram)。該算法提供與該頁面相同的輸出。 – RedFilter 2010-09-30 16:33:24
請給出樣本輸入和期望的輸出。 – RedFilter 2010-09-30 15:41:11