如何將列對值的並集轉換爲線性表？

我們有重複的客戶號碼錶：

A varchar(16) NOT NULL, 
B varchar(16) NOT NULL

這些列開始是新舊（刪除和保留），但移交給既不是首選的位置。這些專欄實際上只是「A」和「B」 - 對於同一個客戶而言，以任何順序都是兩個數字。

此外，該表可以具有任意數量的成對的同一個客戶。您可能會看到像

a,b 
b,c

意義a，b，c都是針對同一個客戶。您可能還會看到像

a,b 
b,a 
c,a

這意味着a，b，c都是同一個客戶。

這是不是一個乾淨的非循環表示法，如「舊」和「新」值。客戶的客戶ID列表在此表中以一行或多行的塊表示，其中唯一的連接是一行中的A或B列的值可能顯示在其他行的A或B列中。我的任務是將它們全部綁定到每個客戶的列表中。

我想這個爛攤子轉換爲類似

MasterKey int NOT NULL, 
CustNum varchar(16) NOT NULL UNIQUE, 
PRIMARY KEY(MasterKey, CustNum)

的一個或多個號碼，客戶將分享此表中的MasterKey。如UNIQUE約束所述，給定的CustNum不能出現一次以上。

因此，舉例來說，像這樣的行從原來的

1a,1b 
1b,1c 
2a,2b 
2b,2c 
2d,2a 
...

應該結束了在新表

1 1a 
1 1b 
1 1c 
2 2a 
2 2b 
2 2c 
2 2d 
...

編輯這樣行：以上數值只是爲了格局明確。實際的客戶號碼值是任意varchar s。

我嘗試的解決方案

這感覺就像遞歸工作，因此一個CTE。但源數據的潛在循環特性使我很難獲得錨定案例。我試圖將它預先清理成更多的非循環形式，但我似乎無法得到正確的結果。

我也固執地試圖做這個基於集合的SQL操作，而不是訴諸於遊標和循環。但也許這是不可能的。

我花了好幾個小時思考這個，嘗試不同的方法，但它不斷滑落。任何關於正確方法的想法或建議，甚至是一些示例代碼？

來源

2011-09-20 Greg Hendershott

我會做一些我以前從未做過，並張貼回答我自己的問題。我需要非常感謝Beth和JBrooks 讓我朝着正確的方向前進。我真的想要解決這個在一個基於集合，聲明的方式。也許這仍然可以使用 CTE和遞歸。但是，一旦我放棄了，並說它可以做到必要和迭代，那麼做起來就容易多了。

無論如何，鑑於從我的問題這個目標表：

CREATE TABLE UniqueCustomers 
(
    uid  int NOT NULL, 
    gpid varchar(16) NOT NULL UNIQUE, -- Important: UNIQUE to disallow duplicates 
    PRIMARY KEY(uid, gpid) -- Important: Disallow duplicates 
)

我想出了以下存儲過程。它可以被調用，當新的愚蠢報道，一個接一個。它也可以在傳統表格上調用，該表格以隨機的順序將其作爲成對存儲。

CREATE PROCEDURE ReportDuplicateCustomerIDs 
(
    @id1 varchar(16), 
    @id2 varchar(16) 
) 
AS 
BEGIN 
    IF @id1 <> @id2 
    BEGIN 
     -- Retrieve the uid (if any) for each of the ids 
     DECLARE @uid1 int 
     SELECT @uid1 = NULL 
     SELECT @uid1 = uid FROM UniqueCustomers WHERE gpid = @id1 

     DECLARE @uid2 int 
     SELECT @uid2 = NULL 
     SELECT @uid2 = uid FROM UniqueCustomers WHERE gpid = @id2 

     -- If we've seen NEITHER of the id's yet 
     IF @uid1 IS NULL AND @uid2 IS NULL 
     BEGIN 
      -- Add both of them using a brand-new uid 
      DECLARE @uidNew int 
      SELECT @uidNew = Max(uid) + 1 FROM UniqueCustomers 
      IF @uidNew IS NULL 
       SET @uidNew = 0 
      INSERT INTO UniqueCustomers VALUES(@uidNew, @id1) 
      INSERT INTO UniqueCustomers VALUES(@uidNew, @id2) 
     END 
     ELSE 
     BEGIN 
      -- If we've seen BOTH id's already 
      IF @uid1 IS NOT NULL AND @uid2 IS NOT NULL 
      BEGIN 
       -- If this pair bridges two existing chains. 
       IF @uid1 <> @uid2 
       BEGIN 
        -- Update everything using uid2 to use uid1 instead. 
        -- Consolidates the two dupe chains into one. 
        UPDATE UniqueCustomers SET uid = @uid1 WHERE uid = @uid2 
       END 
       -- ELSE nothing to do 
      END 
      ELSE 
       -- If we've seen only id1, then insert id2 using 
       -- the same uid that id1 is already using 
       IF @uid1 IS NOT NULL 
        INSERT INTO UniqueCustomers VALUES(@uid1, @id2) 
       -- If we've seen only id2, then insert id1 using 
       -- the same uid that id2 is already using 
       ELSE -- @uid2 IS NOT NULL 
        INSERT INTO UniqueCustomers VALUES(@uid2, @id1) 
     END 
    END 
END 
GO

來源

2011-09-22 19:27:24

我放棄了對這兩個答案的支持，但我接受了我自己的答案，因爲它是最正確和最完整的答案。我覺得這很奇怪，但從常見問題解答看來，這應該是我應該做的。再次感謝Beth和JBrooks！ –

看起來像工作的工會給我。下面的代碼假設你不能在同一記錄中有1a，2b。

創建表#TEMP（一個varchar（10）中，b爲varchar（10））

insert into #temp 
values ('1a', '1b') 
,('1b', '1c') 
,('2a', '2b') 
,('2b', '2c') 
,('2d', '2a') 

select * from #temp 

select a, b, left (a, 1) as id into #temp2 from #temp 

select id, a from #temp2 
union 
select id, b from #temp2

來源

2011-09-20 21:02:17 HLGEM

感謝您的快速回答，但我在示例中使用了'1a'，'1b'等來使所需圖案更清晰。如果實際的數據已經有這樣的模式，這會更容易。 :)相反，實際值將是任意'varchar's，如'FOO12'，'BAR666'，'GURGLE721'。很遺憾，我們無法使用'Left（）'從值中獲取'MasterKey'值。 –

給定的輸入數據：

a,b 
b,c 
d,e 
e,f 
g,d

我添加兩個新的表，一個與所述的PK值，和一個在與所述PKS的一對多的關係pk和重複的值，如下所示：

pk 
a 
b 
c 
d 
e 
f 
g 


pk dup 
a b 
b a 
b c 
c b 
d e 
e d 
e f 
f e 
g d 
d g

行中的PK/DUP表是通過輸入文件填充pks並重復插入（pk，dup）序列和（dup，pk）序列。

這可以讓你的第一組鍵和副本之間的關係，但是你需要再通過這套迭代得到間接的關係，如「c是一個重複」

你可以得到這些關係通過自我加入pkdup1.dup = pkdup2.pk上的pk/dup表。這將行（a，b）與行（b，a）和（b，c）結合在一起，允許您識別關係（a，c）。它也將拾取（d，f）（f，d）（g，e）。你需要重複迭代回暖（G，F）

HTH

來源

2011-09-20 21:29:13 Beth

什麼是找到關鍵的格局？如果比這個數開始將其拔出

select substring('FOO12',patindex('%[0-9]%','FOO12'),100)

：

select substring('12FOO',1,patindex('%[A-Z]%','12FOO')-1)

兩種方法都返回12

如果字符串中只是第一個數字那麼這將拉出來

來源

2011-09-20 21:40:19 JBrooks

我想你將不得不做一些循環。在這裏，我一次只查看1行，以確保獲得屬於單個主密鑰的所有鏈接值。

while (1=1) 
begin 

    -- get the next key that is not inserted yet as MasterKey or key 
    select top 1 @masterKey = a 
    from myTable 
    where not exists (select 1 
     from #temp 
     where #temp.MasterKey = myTable.a 
     or #temp.Key = myTable.a) 

    if(@masterKey is null) -- out of a's so now work the b's 
     select top 1 @masterKey = b 
     from myTable 
     where not exists (select 1 
      from #temp 
      where #temp.MasterKey = myTable.b 
      or #temp.Key = myTable.b) 

    if(@masterKey is null) -- totally done. 
     break 

    insert into #temp 
    (masterKey, key) 
    values(@masterKey, @masterKey) 


    while (1=1) -- now insert anything new with this masterKey in a 
    begin 
     insert into #temp 
     select top 1 @masterKey, myTable.b 
     from myTable 
     where myTable.a = @masterKey 
     not exists (select 1 
     from #temp 
     where #temp.MasterKey = myTable.b 
     or #temp.Key = myTable.b)) 

     if @@rowcount < 1 
      break 
    end 


    while (1=1) -- now insert anything with this masterKey in b 
    begin 
     insert into #temp 
     select top 1 @masterKey, myTable.a 
     from myTable 
     where myTable.b = @masterKey 
     not exists (select 1 
     from #temp 
     where #temp.MasterKey = myTable.a 
     or #temp.Key = myTable.a)) 

     if @@rowcount < 1 
      break 

    end 

end

你真得2個插入段包裝成另一個循環，以確保其獲得下masterKey之前用完，但你的想法。

來源

2011-09-20 22:35:05 JBrooks

這看起來很有希望，但我沒有按照最後兩條插入語句的操作。在這兩種情況下，'where myTable.x = @ masterKey'後面跟着'not exists ...'，兩者之間沒有粘連。我想知道你是否打算把'AND'放在那裏，但這似乎不正確？還有一個額外的結束準字。對不起，如果我很密集。 –

不，你是對的 - 兩者之間應該有一個AND。閉幕式是額外的，但我沒有給你100％的語法，更多的是解決這個問題的方法。 – JBrooks

好吧，我明白了。（我絕對不希望你爲我編寫所有的代碼:)我確實想確保我對你的想法感興趣。）謝謝。 –

根據評論中的一些樣本數據，我認爲這應該有所斬斷？

CREATE TABLE #sample 
(A NVARCHAR(50) 
,B NVARCHAR(50)) 

INSERT INTO #sample VALUES('FOO12','12DEF') 
INSERT INTO #sample VALUES('12GHJ','12ABC') 
INSERT INTO #sample VALUES('GURGLE721','GURGLZ721') 
INSERT INTO #sample VALUES('word21','book721') 
INSERT INTO #sample VALUES('orange21','apple21') 

;WITH CTE as 
(
SELECT A 
,PATINDEX('%[A-Za-z]%',A) as text_start 
,PATINDEX('%[0-9]%',A) as num_start 
FROM #sample 
UNION ALL 
SELECT B 
,PATINDEX('%[A-Za-z]%',B) as text_start 
,PATINDEX('%[0-9]%',B) as num_start 
FROM #sample 
) 
,cte2 AS 
(
SELECT 
* 
,CASE WHEN text_start > num_start --Letters after numbers 
    THEN SUBSTRING(A,text_start - num_start + 1,99999) 
    WHEN text_start = 1 --Letters at start of string 
    THEN SUBSTRING(A,1,num_start - 1) 
    END AS letters 
,CASE WHEN num_start > text_start --Numbers after letters 
    THEN SUBSTRING(A,num_start - text_start + 1,99999) 
    WHEN num_start = 1 --Numbers at start of string 
    THEN SUBSTRING(A,1,text_start- 1) 
    END AS numbers 
FROM cte 
) 
SELECT DISTINCT 
DENSE_RANK() OVER (ORDER BY numbers ASC) as group_num 
,numbers + letters as cust_details 
FROM cte2 
ORDER BY numbers + letters asc

來源

2011-09-21 10:09:12 Dibstar

如何將列對值的並集轉換爲線性表？

回答

相關問題