2010-08-03 176 views
1

嘿,我有2個表有很多列,我想找到table1.somecolumn的值包含在table2.someothercolumn中的那些行。例如:檢查一列值是否包含在另一列值(TSQL)中?

table1.somecolumn有史密斯,彼得
table2.someothercolumn有peter.smith

這應該是一個比賽,我怎麼會做這樣的搜索?

謝謝:)

回答

1

有根據幾個可能的解決方案,正是你需要: 使用可以創建輔助表關鍵字存儲每條記錄

  1. 使用的輔助表存儲關鍵字對每條記錄或記錄和現場。例如。 table_helper(id int主鍵,record_id int,keyword varchar),record_id - 鏈接到源表。在table1,table2的觸發器中填充此表。查詢通用行是table_helper與自身的簡單交集。您可以爲table1和table2創建一個助手或使用單獨的表。
  2. 使用全文索引。
2

您可以嘗試SOUNDEXDIFFERENCE函數來幫助匹配字符串文字。

實施例:

select difference('peter.green', 'Green, Peter') 

返回2,由此:

的整數返回是 字符在SOUNDEX值即 是相同的數目。從0到4的返回值範圍爲 :0表示弱或 不相似,並且4表示強 相似或相同的值。

請參閱SOUNDEXDIFFERENCE MSDN上的主題。

更新:

探測法&差異不能正常工作時的單詞順序考慮,但如果你已經安裝了全文索引功能,您不需要創建使用這個詞的索引打破和解析全文引擎的能力。假設你使用SQL Server 2008,下面的函數將返回標準化術語列表:

SELECT * FROM sys.dm_fts_parser('"Peter Green"', 1033, 0, 0) 

,通過它可以CROSS APPLY到您的查詢的其餘部分。

請參閱sys.dm_fts_parser主題&部分K.使用在FROM主題中應用以獲取更多信息。

例子:(SQL Server企業2008年啓用了全文搜索引擎)

if not OBJECT_ID('Names1', 'Table') is null drop table names1 
if not OBJECT_ID('Names2', 'Table') is null drop table names2 

create table Names1 
(
    id int identity(0, 1), 
    name nvarchar(128) 
) 
insert into Names1 (name) values ('Green, Peter') 
insert into Names1 (name) values ('Smith, Peter') 
insert into Names1 (name) values ('Aadland, Beverly') 
insert into Names1 (name) values ('Aalda, Mariann') 
insert into Names1 (name) values ('Aaliyah') 
insert into Names1 (name) values ('Aames, Angela') 
insert into Names1 (name) values ('Aames, Willie') 
insert into Names1 (name) values ('Aaron, Caroline') 
insert into Names1 (name) values ('Aaron, Quinton') 
insert into Names1 (name) values ('Aaron, Victor') 
insert into Names1 (name) values ('Abbay, Peter') 
insert into Names1 (name) values ('Abbott, Dorothy') 
insert into Names1 (name) values ('Abbott, Bruce') 
insert into Names1 (name) values ('Abbott, Bud') 
insert into Names1 (name) values ('Abbott, Philip') 
insert into Names1 (name) values ('Abdoo, Rose') 
insert into Names1 (name) values ('Abdul, Paula') 
insert into Names1 (name) values ('Abel, Jake') 
insert into Names1 (name) values ('Abel, Walter') 
insert into Names1 (name) values ('Abeles, Edward') 
insert into Names1 (name) values ('Abell, Tim') 
insert into Names1 (name) values ('Aber, Chuck') 

create table Names2 
(
    id int identity(200, 1), 
    name nvarchar(128) 
) 
insert into Names2 (name) values (LOWER('Peter.Green')) 
insert into Names2 (name) values (LOWER('Peter.Smith')) 
insert into names2 (name) values (LOWER('Beverly.Aadland')) 
insert into names2 (name) values (LOWER('Mariann.Aalda')) 
insert into names2 (name) values (LOWER('Aaliyah')) 
insert into names2 (name) values (LOWER('Angela.Aames')) 
insert into names2 (name) values (LOWER('Willie.Aames')) 
insert into names2 (name) values (LOWER('Caroline.Aaron')) 
insert into names2 (name) values (LOWER('Quinton.Aaron')) 
insert into names2 (name) values (LOWER('Victor.Aaron')) 
insert into names2 (name) values (LOWER('Peter.Abbay')) 
insert into names2 (name) values (LOWER('Dorothy.Abbott')) 
insert into names2 (name) values (LOWER('Bruce.Abbott')) 
insert into names2 (name) values (LOWER('Bud.Abbott')) 
insert into names2 (name) values (LOWER('Philip.Abbott')) 
insert into names2 (name) values (LOWER('Rose.Abdoo')) 
insert into names2 (name) values (LOWER('Paula.Abdul')) 
insert into names2 (name) values (LOWER('Jake.Abel')) 
insert into names2 (name) values (LOWER('Walter.Abel')) 
insert into names2 (name) values (LOWER('Edward.Abeles')) 
insert into names2 (name) values (LOWER('Tim.Abell')) 
insert into names2 (name) values (LOWER('Chuck.Aber')); 

with ftsNamesFirst (id, term) as 
(
    select id, terms.display_term 
     from names1 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms 
), ftsNamesSecond (id, term) as 
(
select id, terms.display_term 
     from names2 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms 
) 
select * from 
(
    select 
    ROW_NUMBER() over (partition by nfirst.id order by sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) desc) ranking, 
    sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) Confidence, 
    nFirst.id Names1ID, 
    nFirst.name Names1Name, 
    nSecond.id Names2ID, 
    nSecond.name Names2Name 
    from 
    ftsNamesFirst cross join ftsNamesSecond 
    left outer join names1 nFirst on nFirst.id = ftsNamesFirst.id 
    left outer join names2 nSecond on nSecond.id = ftsNamesSecond.id 
    where DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term) = 4 
    group by 
     nFirst.id, nFirst.name, nSecond.id, nSecond.name 
) MatchedNames 
where ranking = 1 

輸出:

凡與置信度最高的匹配優先(所有其他人都被過濾掉使用窗口排名查詢)。

Confidence Names1ID Names1Name Names2ID Names2Name 
8 0 Green, Peter 200 peter.green 
8 1 Smith, Peter 201 peter.smith 
8 2 Aadland, Beverly 202 beverly.aadland 
8 3 Aalda, Mariann 203 mariann.aalda 
4 4 Aaliyah 204 aaliyah 
8 5 Aames, Angela 205 angela.aames 
8 6 Aames, Willie 206 willie.aames 

這並不完美,但這是一個很好的起點,從它可以調整以提高成功概率。

相關問題