2016-10-28 44 views
5

美好的一天,我想爲Set Cover Problem實現T-SQL查詢,但一直無法找到有關如何在SQL中執行此操作的任何提示。Set Cover的查詢

就我而言,我的表只是有兩列(IDnumberMut),我想找到的IDNumber最小數量讓每一個Mut之一。我真的想要每Mut獲得三個IDnumbers,但我想我最好從一個開始,因爲這可能會更容易。

DECLARE @myTable TABLE (
    IDnumber int, 
    Mut varchar(1)) 

INSERT @myTable VALUES 
(1,'C'), (1,'N'), (1,'Z'), (1,'M'), (1,'E'), (2,'E'), (3,'B'), (3,'N'), (3,'D'), (3,'K'), 
(3,'W'), (4,'O'), (4,'G'), (4,'N'), (4,'B'), (4,'U'), (4,'C'), (5,'Q'), (5,'H'), (6,'K'), 
(6,'Y'), (6,'M'), (6,'A'), (6,'O'), (6,'U'), (6,'J'), (7,'H'), (7,'U'), (7,'M'), (7,'L'), 
(8,'B'), (8,'K'), (8,'P'), (9,'Y'), (9,'K'), (10,'Z'), (11,'R'), (12,'X'), (12,'R'), 
(12,'O'), (12,'Z'), (4,'C'), (1,'Z'), (4,'S'), (6,'E'), (5,'G'), (4,'C'), (4,'S'), (4,'H'), 
(6,'D'), (7,'W'), (3,'U'), (6,'N'), (7,'Y'), (6,'N'), (6,'F'), (4,'C'), (4,'I'), (7,'P'), 
(10,'H'), (10,'Z'), (10,'S'), (7,'Z'), (6,'B'), (7,'Z'), (8,'X'), (8,'J'), (8,'P'), (10,'K'), 
(8,'K'), (12,'P'), (8,'W'), (10,'M'), (12,'F'), (9,'T'), (9,'D'), (14,'Y'), (12,'P'), 
(14,'J'), (13,'D'), (15,'H'), (12,'J'), (6,'H'), (2,'Z'), (8,'G'), (10,'Q'), (6,'D'), 
(5,'X'), (9,'T'), (6,'W'), (6,'K'), (10,'W'), (7,'J'), (11,'W'), (12,'V'), (9,'F'), (7,'F'), 
(4,'M'), (5,'K'), (12,'G'), (12,'T'), (15,'T'), (13,'W'), (7,'J'), (9,'T'), (10,'U'), (9,'S'), 
(10,'L'), (10,'J'), (10,'H'), (11,'H'), (12,'S'), (12,'A'), (14,'L'), (13,'K'), (13,'D'), 
(4,'M'), (3,'N'), (4,'F'), (7,'M'), (7,'V'), (5,'R'), (4,'K'), (5,'F'), (7,'G'), (8,'M'), 
(4,'X'), (7,'F'), (9,'S'), (7,'N'), (6,'W'), (6,'W'), (5,'S'), (9,'Z'), (10,'I'), (11,'Y'), 
(11,'D'), (9,'X'), (7,'G'), (9,'S'), (9,'H'), (9,'T'), (8,'J'), (10,'U'), (9,'F'), (9,'S'), 
(7,'D'), (14,'R'), (10,'F'), (7,'E'), (15,'M'), (12,'F'), (5,'C'), (8,'E'), (16,'G'), (11,'V'), 
(10,'I'), (12,'I'), (11,'Y'), (12,'I'), (14,'J'), (15,'D'), (19,'J'), (16,'B'), (12,'G'), 
(9,'J'), (18,'J'), (18,'C'), (16,'Q'), (18,'P'), (13,'F'), (19,'T'), (15,'J'), (15,'R'), 
(15,'Q'), (15,'O'), (11,'A'), (24,'B'), (19,'S'), (22,'I'), (15,'X'), (20,'T'), (15,'E'), 
(9,'V'), (8,'H'), (16,'N'), (17,'H') 
-- Since the above list was generated by a bunch of random numbers/letters I need to 
-- delete the duplicates 

;WITH cte AS (
    SELECT IDnumber, mut, 
    row_number() OVER(PARTITION BY IDNumber, mut ORDER BY IDNumber) AS [rn] 
    FROM @myTable 
) 
DELETE cte WHERE [rn] > 1 


SELECT * 
FROM (SELECT IDnumber, Mut FROM @myTable) AS S 
PIVOT 
(COUNT(Mut) FOR mut IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L],[M],[N],[O],[P], 
[Q],[R],[S],[T],[U],[V],[W],[X],[Y],[Z])) AS pvt 

所以,你可以從數據透視表看到,最小IDnumbers是3,5,7,和12

一個人怎麼會去執行的算法?在我看來,我可以找到所有組合(2^6),這將是集合,然後確定哪些集合具有所有的Muts。 IDnumbers數量最少的集合就是最小集合。

這種強力手段可能會起作用,但效率極低。我真實世界的情況並不是很大,我有43個獨特的Muts(不是在這個例子中的九個)和〜2000 IDnumbers,但我想這需要一段時間才能運行,因爲2^2000相當糟糕...

謝謝!

+0

你不能顯示你所期望的輸出嗎 – KumarHarsh

+1

這個問題構成了[相似性](http://stackoverflow.com/questions/25334417/minimal-number-of-groups-necessary-to-cover-user-product-權限)但沒有答案。此外[這個問題確實收到答案](http://stackoverflow.com/questions/28202429/dynamic-t-sql-approach-for-combinatorics-knapsack),你可能會改變你的要求 –

+0

你能提供輸入所有數據?這是我想挖掘的一個有趣的問題,讓更多的數據會有幫助。 –

回答

1

下面是其計算Mut值的掩碼爲每個IDNumber的方法,然後使用遞歸CTE來生成完成該集合的所有可能的組合。代碼輸出提供最終結果的所有可能的IdNumber組合,排除組合中有一個或多個IdNumber是多餘的(例如樣本數據集中的1,2,3,4,5,6)。

要將其轉換爲與您的真實數據一起工作,主要區別在於您幾乎肯定需要找到另一種機制來將位值分配給Mut值。 (我可以在這裏作弊並通過操作每個Mut字母的十進制ASCII碼計算比特值 - POWER(2,ASCII(Mut) - 64))。

DECLARE @myTable TABLE (
    IDnumber int, 
    Mut varchar(1)) 

INSERT @myTable VALUES 
    (1,'A'),(1,'B'),(1,'C'),(1,'D'), 
    (2,'A'),(2,'C'),(2,'D'), 
    (3,'A'),(3,'B'),(3,'F'),(3,'Z'), 
    (4,'A'),(4,'B'),(4,'E'),(4,'F'), 
    (5,'Y'), 
    (6,'X'),(6,'Z') 

-- calculate the target bitmask 
DECLARE @target bigint = ( SELECT SUM(POWER(2,ASCII(Mut) - 64)) 
          FROM (SELECT DISTINCT Mut FROM @myTable) AS x 
         ) 

;WITH baseCTE 
AS 
(
    --calculate a starting bitmask for each ID 
    SELECT IDnumber, sum(mutbit) AS bitmask 
    FROM (SELECT DISTINCT IDnumber, CAST(POWER(2,ASCII(Mut) - 64) AS bigint) AS mutbit FROM @myTable) AS x 
    GROUP BY IDnumber 
) 
,recCTE 
AS 
(
    SELECT IDnumber, bitmask, CAST(IdNumber AS varchar(max)) AS trail, 1 AS lev 
    FROM baseCTE 
    UNION ALL 
    SELECT b.IDnumber, b.bitmask | r.bitmask, CAST(CONCAT(r.trail,',',b.IDnumber) AS varchar(max)), r.lev + 1 
    FROM recCTE AS r 
    JOIN baseCTE AS b 
    ON b.IDnumber > r.IDnumber 
    WHERE b.bitmask | r.bitmask <> r.bitmask --ignore combinations which don't add any new Mut values 
) 
SELECT trail, lev 
FROM recCTE 
WHERE bitmask = @target 
ORDER BY lev 

NB。 SQL Server按位運算符僅適用於其中一個或其他值爲整數類型的情況,因此此解決方案僅限於64個不同的Mut值(其中掩碼爲bigint) - 爲了使其超出該範圍,您必須使用一個自定義的按位比較器(可能使用CLR)。但是,由於該問題提到您有值,所以現在可能適用於您。

+0

@ user918967 - 我已經對我的答案做了一個小小的更新,以處理您發佈的較大示例集中的重複項。 –

+0

我真的很喜歡你的解決方案,因爲它給我所有的設置!您是否也可以向我發送關於如何將自定義按位比較器擴展爲多於64個值的指針? – user918967

+0

@ user918967 - Adam Machanic發佈了一箇舊的(2006)系列文章,用於實現大型位掩碼的按位邏輯 - http://sqlblog.com/blogs/adam_machanic/archive/tags/bitmasks/default.aspx。對於這個問題,你只需要實現按位OR(從第3部分) –

2

我想更大的數據集進行測試,但這個搜索結果匹配至少給出的數據集...

DECLARE @myTable TABLE (
     IDnumber INT, 
    Mut VARCHAR(1)) 

DECLARE @results TABLE (
    IDnumber INT) 

INSERT @myTable VALUES 
    (1,'A'),(1,'B'),(1,'C'),(1,'D'), 
    (2,'A'),(2,'C'),(2,'D'), 
    (3,'A'),(3,'B'),(3,'F'),(3,'Z'), 
    (4,'A'),(4,'B'),(4,'E'),(4,'F'), 
    (5,'Y'), 
    (6,'X'),(6,'Z') 

DECLARE @IDnumber INT 

WHILE EXISTS (SELECT 1 FROM @myTable) 
BEGIN 

    WITH MutRank AS 
    (
     -- Find how many IDNumbers are associated with each mut 
     SELECT Mut, 
      COUNT(DISTINCT IDnumber) AS IDnumberCnt 
     FROM @myTable 
     GROUP BY Mut 
    ), MutRarity AS 
    (
     -- Translate the IDNumberCnt into a weighted rarity so dupes at 
     -- a IDNumberCnt level reduce their rarity compared to the next level 
     SELECT Mut, 
      RANK() OVER (ORDER BY IDnumberCnt DESC) AS MutRarity 
     FROM MutRank 
    ) 
    -- Grab the IDNumber that is associated to the most "rare" muts 
    SELECT @IDnumber = n.IDnumber 
    FROM (SELECT TOP 1 m.IDnumber, 
      SUM(MutRarity) AS IDNumberValue 
     FROM @myTable m 
     JOIN MutRarity mr 
      ON m.Mut = mr.Mut 
     GROUP BY m.IDnumber 
     ORDER BY IDNumberValue DESC, IDnumber) n 

    -- Save the number in our results 
    INSERT @results (IDnumber) 
    SELECT @IDnumber 

    -- Purge all records for the "covered" muts and the selected IDNumber 
    DELETE m2 
    FROM @myTable m1 
    JOIN @myTable m2 
     ON m1.Mut = m2.Mut 
     AND m1.IDnumber = @IDnumber 
END 

SELECT * 
FROM @results 
ORDER BY IDnumber 
+0

很好的解決方案,但另一種解決方案給*所有* sets.Thanks! – user918967

+0

是的,我以爲你正在尋找最低數字的最小組合的單一最佳解決方案(所以在你的原始設置1,4,5,6和2,4,5,6工作,但我想第一個回來);我以爲你試圖避免把它們全部歸還,然後得到最小的一套。這就是說,我同意,@EdHarper解決方案是一個非常優雅的解決方案。當我看到它時,我印象深刻。 –