嘿,我有2個表有很多列,我想找到table1.somecolumn的值包含在table2.someothercolumn中的那些行。例如:檢查一列值是否包含在另一列值(TSQL)中?
table1.somecolumn有史密斯,彼得和
table2.someothercolumn有peter.smith
這應該是一個比賽,我怎麼會做這樣的搜索?
謝謝:)
嘿,我有2個表有很多列,我想找到table1.somecolumn的值包含在table2.someothercolumn中的那些行。例如:檢查一列值是否包含在另一列值(TSQL)中?
table1.somecolumn有史密斯,彼得和
table2.someothercolumn有peter.smith
這應該是一個比賽,我怎麼會做這樣的搜索?
謝謝:)
有根據幾個可能的解決方案,正是你需要: 使用可以創建輔助表關鍵字存儲每條記錄
您可以嘗試SOUNDEX
或DIFFERENCE
函數來幫助匹配字符串文字。
實施例:
select difference('peter.green', 'Green, Peter')
返回2
,由此:
的整數返回是 字符在SOUNDEX值即 是相同的數目。從0到4的返回值範圍爲 :0表示弱或 不相似,並且4表示強 相似或相同的值。
請參閱SOUNDEX和DIFFERENCE MSDN上的主題。
更新:
探測法&差異不能正常工作時的單詞順序考慮,但如果你已經安裝了全文索引功能,您不需要創建使用這個詞的索引打破和解析全文引擎的能力。假設你使用SQL Server 2008,下面的函數將返回標準化術語列表:
SELECT * FROM sys.dm_fts_parser('"Peter Green"', 1033, 0, 0)
,通過它可以CROSS APPLY
到您的查詢的其餘部分。
請參閱sys.dm_fts_parser主題&部分K.使用在FROM主題中應用以獲取更多信息。
例子:(SQL Server企業2008年啓用了全文搜索引擎)
if not OBJECT_ID('Names1', 'Table') is null drop table names1
if not OBJECT_ID('Names2', 'Table') is null drop table names2
create table Names1
(
id int identity(0, 1),
name nvarchar(128)
)
insert into Names1 (name) values ('Green, Peter')
insert into Names1 (name) values ('Smith, Peter')
insert into Names1 (name) values ('Aadland, Beverly')
insert into Names1 (name) values ('Aalda, Mariann')
insert into Names1 (name) values ('Aaliyah')
insert into Names1 (name) values ('Aames, Angela')
insert into Names1 (name) values ('Aames, Willie')
insert into Names1 (name) values ('Aaron, Caroline')
insert into Names1 (name) values ('Aaron, Quinton')
insert into Names1 (name) values ('Aaron, Victor')
insert into Names1 (name) values ('Abbay, Peter')
insert into Names1 (name) values ('Abbott, Dorothy')
insert into Names1 (name) values ('Abbott, Bruce')
insert into Names1 (name) values ('Abbott, Bud')
insert into Names1 (name) values ('Abbott, Philip')
insert into Names1 (name) values ('Abdoo, Rose')
insert into Names1 (name) values ('Abdul, Paula')
insert into Names1 (name) values ('Abel, Jake')
insert into Names1 (name) values ('Abel, Walter')
insert into Names1 (name) values ('Abeles, Edward')
insert into Names1 (name) values ('Abell, Tim')
insert into Names1 (name) values ('Aber, Chuck')
create table Names2
(
id int identity(200, 1),
name nvarchar(128)
)
insert into Names2 (name) values (LOWER('Peter.Green'))
insert into Names2 (name) values (LOWER('Peter.Smith'))
insert into names2 (name) values (LOWER('Beverly.Aadland'))
insert into names2 (name) values (LOWER('Mariann.Aalda'))
insert into names2 (name) values (LOWER('Aaliyah'))
insert into names2 (name) values (LOWER('Angela.Aames'))
insert into names2 (name) values (LOWER('Willie.Aames'))
insert into names2 (name) values (LOWER('Caroline.Aaron'))
insert into names2 (name) values (LOWER('Quinton.Aaron'))
insert into names2 (name) values (LOWER('Victor.Aaron'))
insert into names2 (name) values (LOWER('Peter.Abbay'))
insert into names2 (name) values (LOWER('Dorothy.Abbott'))
insert into names2 (name) values (LOWER('Bruce.Abbott'))
insert into names2 (name) values (LOWER('Bud.Abbott'))
insert into names2 (name) values (LOWER('Philip.Abbott'))
insert into names2 (name) values (LOWER('Rose.Abdoo'))
insert into names2 (name) values (LOWER('Paula.Abdul'))
insert into names2 (name) values (LOWER('Jake.Abel'))
insert into names2 (name) values (LOWER('Walter.Abel'))
insert into names2 (name) values (LOWER('Edward.Abeles'))
insert into names2 (name) values (LOWER('Tim.Abell'))
insert into names2 (name) values (LOWER('Chuck.Aber'));
with ftsNamesFirst (id, term) as
(
select id, terms.display_term
from names1 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
), ftsNamesSecond (id, term) as
(
select id, terms.display_term
from names2 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
)
select * from
(
select
ROW_NUMBER() over (partition by nfirst.id order by sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) desc) ranking,
sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) Confidence,
nFirst.id Names1ID,
nFirst.name Names1Name,
nSecond.id Names2ID,
nSecond.name Names2Name
from
ftsNamesFirst cross join ftsNamesSecond
left outer join names1 nFirst on nFirst.id = ftsNamesFirst.id
left outer join names2 nSecond on nSecond.id = ftsNamesSecond.id
where DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term) = 4
group by
nFirst.id, nFirst.name, nSecond.id, nSecond.name
) MatchedNames
where ranking = 1
輸出:
凡與置信度最高的匹配優先(所有其他人都被過濾掉使用窗口排名查詢)。
Confidence Names1ID Names1Name Names2ID Names2Name
8 0 Green, Peter 200 peter.green
8 1 Smith, Peter 201 peter.smith
8 2 Aadland, Beverly 202 beverly.aadland
8 3 Aalda, Mariann 203 mariann.aalda
4 4 Aaliyah 204 aaliyah
8 5 Aames, Angela 205 angela.aames
8 6 Aames, Willie 206 willie.aames
這並不完美,但這是一個很好的起點,從它可以調整以提高成功概率。