背景 - 我有一組客戶數據和使用的字符串匹配算法來比較的所有記錄的相似性。然後,我需要直接或通過關聯將彼此相關的結果進行分組,併爲每個組應用唯一的ID。SQL服務器記錄鏈接字符串匹配後
問題 - 我不能想辦法以連接在一起的記錄,並應用一個唯一的ID爲每個組
例
數據目前看起來是這樣的,已找到的匹配(MatchScore與此處的問題無關,只是爲了證明數據來自哪裏)。
+-------------+-------------+------------+
| CustomerID1 | CustomerID2 | MatchScore |
+-------------+-------------+------------+
| 2021000 | 2707799 | 0.075 |
| 2021000 | 3856308 | 0.082 |
| 774062 | 774063 | 0.041 |
| 998328 | 2278386 | 0.063 |
| 998328 | 998329 | 0.058 |
| 998329 | 2278386 | 0.030 |
+-------------+-------------+------------+
底部的3條記錄都是鏈接的,因此我希望它們有相同的ID關聯。
visual image of these records all being related
這就是我想要的數據看起來像
+----+-------------+-------------+------------+
| ID | CustomerID1 | CustomerID2 | MatchScore |
+----+-------------+-------------+------------+
| 1 | 998328 | 2278386 | 0.063 |
| 1 | 998328 | 998329 | 0.058 |
| 1 | 998329 | 2278386 | 0.030 |
| 2 | 2021000 | 2707799 | 0.075 |
| 2 | 2021000 | 3856308 | 0.082 |
| 3 | 774062 | 774063 | 0.041 |
+----+-------------+-------------+------------+
或類似
+----+------------+
| ID | CustomerID |
+----+------------+
| 1 | 2278386 |
| 1 | 998328 |
| 1 | 998329 |
| 2 | 2021000 |
| 2 | 2707799 |
| 2 | 3856308 |
| 3 | 774062 |
| 3 | 774063 |
+----+------------+
代碼來生成示例表
select '998328' as CustomerID1,'998329' as CustomerID2,'0.058' as MatchScore
into #tmp
union
select '998328' as CustomerID1,'2278386' as CustomerID2,'0.063' as MatchScore
union
select '998329' as CustomerID1,'2278386' as CustomerID2,'0.030' as MatchScore
union
select '2021000' as CustomerID1,'2707799' as CustomerID2,'0.075' as MatchScore
union
select '2021000' as CustomerID1,'3856308' as CustomerID2,'0.082' as MatchScore
union
select '774062' as CustomerID1,'774063' as CustomerID2,'0.041' as MatchScore
select * from #tmp
正如我所說,我不知道如何將記錄聯繫在一起,我嘗試了各種聯合,但是尤里卡時刻從未到來。請你幫忙。
感謝
底部3條記錄是什麼意思?它們是否僅僅因爲CustomerID1被列出了多個CustomerId2值而被鏈接?爲什麼'CustomerID1'998328和998329最終具有相同的'ID'值? – Taryn
它,因爲3個獨立的記錄意味着客戶998328和2278386匹配,998328和998329的比賽,998329和2278386匹配。因此,所有3個都被證明是相互匹配的,所以得到相同的ID。 – DataPro