這是一個很好的練習。這是我使用Tally Table的嘗試。
SQL Fiddle
;WITH E1(N) AS(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b),
E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b),
Tally(N) AS(
SELECT TOP (
SELECT
CASE
WHEN MAX(LEN(String1)) > MAX(LEN(String2)) THEN MAX(LEN(String1))
ELSE MAX(LEN(String2))
END
FROM TestTable
)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E8
),
CteTable AS(-- Added an ID to uniquely identify each row
SELECT *, Id = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM TestTable
),
CteSubStr1 AS(
SELECT
ct.*,
substr = SUBSTRING(ct.String1, t.N, 4)
FROM CteTable ct
CROSS APPLY(
SELECT N FROM Tally
WHERE N <= LEN(ct.String1) - 3
)t
),
CteSubStr2 AS(
SELECT
ct.*,
substr = SUBSTRING(ct.String2, t.N, 4)
FROM CteTable ct
CROSS APPLY(
SELECT N FROM Tally
WHERE N <= LEN(ct.String2) - 3
)t
),
CteCommon AS(
SELECT * FROM CteSubStr1 c1
WHERE EXISTS(
SELECT 1 FROM CteSubStr2
WHERE
Id = c1.Id
AND substr = c1.substr
)
)
SELECT
String1, String2, substr
FROM (
SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY LEN(substr) DESC)
FROM CteCommon
)t
WHERE RN = 1
結果
| String1 | String2 | substr |
|-----------|------------|--------|
| xxjohnyy | abcjohnabc | john |
| xxjohnyy | johnny | john |
| birdsings | ravenbird | bird |
| singbird | a singer | sing |
這部分尋找最長公共子。
SELECT
String1, String2, substr
FROM (
SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY LEN(substr) DESC)
FROM CteCommon
)t
WHERE RN = 1
要獲得所有常見字符串,而不是使用:
SELECT * FROM CteCommon
不確切地確定你在這裏想要什麼。你只是比較同一行的值,意思是t1.row1和t2.row2?或者,您是否正在查看t1中的每一行並在t2中查找所有匹配的內容?另外,當你找到一場比賽時你想做什麼?加入表格並添加一列指示什麼4char字符串匹配?另外,如果有2個4字符匹配(即duoew39uoie和uoewiyuoie)會怎樣? – DiscipleMichael
兩張桌子有多大?這將涉及具有非平凡聯接條件的交叉聯接。 –
@DiscipleMichael我想要「查看t1中的每一行並在t2中查找所有匹配」。背景:客戶端維護了兩個凌亂的Excel表格,每個表格有5000條記錄,這些記錄將被清理並移入數據庫。兩個表都包含一個「項目描述」,這是匹配所需要的,但是隻能使用一個帶描述的子字符串(例如姓氏)。此SQL僅在清理和導入過程中使用。基本上,我會爲了完成95%的工作而加入,其餘部分將進行手動審查。 –