SQL：組裝非重疊集合

我有一組連續整數，由type組織，編號爲table1。所有值在1和10之間，包括在內。SQL：組裝非重疊集合

table1: 
row_id set_id type min_value max_value 
1  1  a  1   3 
2  2  a  4   10 
3  3  a  6   10 
4  4  a  2   5 
5  5  b  1   9 
6  6  c  1   7 
7  7  c  3   10 
8  8  d  1   2 
9  9  d  3   3 
10  10  d  4   5 
11  11  d  7   10

在table2，每個type內，我想組裝所有可能最大，非重疊集合（儘管這不能由任何set S上的正確type的填充間隙是沒關係）。期望的輸出：

table2: 
row_id type group_id set_id 
1  a  1   1 
2  a  1   2 
3  a  2   1 
4  a  2   3 
5  a  3   3 
6  a  3   4 
7  b  4   5 
8  c  5   6 
9  c  6   7 
10  d  7   8 
11  d  7   9 
12  d  7   10 
13  d  7   11

我目前的想法是使用的事實，有一個可能的數量有限的事實。步驟：

查找table1含價值1所有集合。將它們複製到table2。
找到table1中的所有組，包含值2，而不是在table2中。
加入從套（2）與上table1type，set_id，以及具有min_value大於group的最大max_value。
對於未加入（3）的（2）的集合，將它們插入到table2中。這些將開始新的group，稍後可能會延長。
對value s 3到10重複步驟（2）到（4）。

我認爲這會工作，但它有很多的痛苦中最對接的步驟，尤其是對（2） - 尋找套不table2，以及（4） - 找到設置沒有加入。

你知道更快，更有效的方法嗎？我的實際數據有數百萬個set s，成千上萬個type s和數百個value（儘管幸運的是，如示例中那樣，value是有界的），所以可伸縮性是必不可少的。

我正在使用PLSQL Developer與Oracle 10g（不像我之前所說 - 謝謝IT部門）。謝謝！

來源

2014-11-14 esa606

你的例子只有有兩個元素的結果。可以有兩個以上嗎？ – 2014-11-14 19:32:36

是的！這只是爲了簡單。可能是從'1'到'10'的每個「值」都是一個單獨的集合，因此是一個單獨的元素。我會更新示例數據以顯示此內容。 – esa606 2014-11-14 19:35:18

我認爲你的過程很好，除了第1步和第2步。你可以用table1中的每一組初始化表。 – 2014-11-14 19:47:25

對於Oracle 10g，您不能使用遞歸CTE，但通過一些工作，您可以使用類似於connect by語法的方法。首先，你需要生成具有所有非重疊的鏈接，你可以做一個CTE或在線查看：

select t1.type, t1.set_id, t1.min_value, t1.max_value, 
    t2.set_id as next_set_id, t2.min_value as next_min_value, 
    t2.max_value as next_max_value, 
    row_number() over (order by t1.type, t1.set_id, t2.set_id) as group_id 
from table1 t1 
left join table1 t2 on t2.type = t1.type 
and t2.min_value > t1.max_value 
where not exists (
    select 1 
    from table1 t4 
    where t4.type = t1.type 
    and t4.min_value > t1.max_value 
    and t4.max_value < t2.min_value 
) 
order by t1.type, group_id, t1.set_id, t2.set_id;

這花了一個實驗位，它當然有可能我已經錯過或丟失有關過程中的規則的一些事情;但是，讓你12的僞行，在我以前的答案，這允許開始a/1兩個獨立鏈同時限制d值的單鏈應遵循：

TYPE SET_ID MIN_VALUE MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GROUP_ID 
---- ------ ---------- ---------- ----------- -------------- -------------- -------- 
a   1   1   3   2    4    10  1 
a   1   1   3   3    6    10  2 
a   2   4   10             3 
a   3   6   10             4 
a   4   2   5   3    6    10  5 
b   5   1   9             6 
c   6   1   7             7 
c   7   3   10             8 
d   8   1   2   9    3    3  9 
d   9   3   3   10    4    5  10 
d  10   4   5   11    7    10  11 
d  11   7   10             12

這可以作爲CTE;查詢該用連接-通過循環：

with t as (
    ... -- same as above query 
) 
select t1.type, 
    dense_rank() over (partition by null 
    order by connect_by_root group_id) as group_id, 
    t1.set_id 
from t t1 
connect by type = prior type 
and set_id = prior next_set_id 
start with not exists (
    select 1 from table1 t2 
    where t2.type = t1.type 
    and t2.max_value < t1.min_value 
) 
and not exists (
    select 1 from t t3 
    where t3.type = t1.type 
    and t3.next_max_value < t1.next_min_value 
) 
order by t1.type, group_id, t1.min_value;

的dense_rank()使組ID相鄰的;不確定你是否真的需要這些，或者他們的順序很重要，所以它是真正的可選項。 connect_by_root給出鏈的起始組ID，因此雖然初始查詢中有12行和12 group_id值，但它們並不全部出現在最終結果中。

連接通過兩個prior值，類型和在初始查詢中找到的下一個集ID。這創造了所有的連鎖店，但擁有自己的連鎖店也將包括更短的連鎖店 - 對於d，你會看到8,9,10,11，而且還有9,10,11和10,11，您不希望它們作爲單獨的組。這些被start with條件消除，這可能會被簡化。

這給：

TYPE GROUP_ID SET_ID 
---- -------- ------ 
a   1  1 
a   1  2 
a   2  1 
a   2  3 
a   3  4 
a   3  3 
b   4  5 
c   5  6 
c   6  7 
d   7  8 
d   7  9 
d   7  10 
d   7  11

SQL Fiddle demo。

來源

2014-11-18 10:28:05

輝煌，優雅，容易縮放到我的大型數據集。非常感謝！ – esa606 2014-11-18 17:58:16

如果您可以識別所有組及其起始set_id，那麼您可以使用遞歸方法並在單個語句中執行此操作，而不是需要迭代填充表。但是，爲了提高速度/效率和資源消耗，您需要對這兩種方法進行基準測試 - 無論是針對您的數據量還是在系統的可用資源範圍內進行擴展，都需要進行驗證。

如果我的理解，當你決定開始您就可以使用查詢一次識別所有這些新組：

with t as (
    select t1.type, t1.set_id, t1.min_value, t1.max_value, 
    t2.set_id as next_set_id, t2.min_value as next_min_value, 
    t2.max_value as next_max_value 
    from table1 t1 
    left join table1 t2 on t2.type = t1.type and t2.min_value > t1.max_value 
    where not exists (
    select 1 
    from table1 t3 
    where t3.type = t1.type 
    and t3.max_value < t1.min_value 
) 
) 
select t.type, t.set_id, t.min_value, t.max_value, 
    t.next_set_id, t.next_min_value, t.next_max_value, 
    row_number() over (order by t.type, t.min_value, t.next_min_value) as grp_id 
from t 
where not exists (
    select 1 from t t2 
    where t2.type = t.type 
    and t2.next_max_value < t.next_min_value 
) 
order by grp_id;

棘手位這裏獲得所有三組a，特別是二組以set_id = 1開頭，但d只有一個組。內部select（在CTE中）通過not exists子句查找沒有較低非重疊範圍的集合，並且外連接到同一個表以獲取不重疊的下一個集合，爲您提供了兩個以set_id = 1開頭的組，還有四個以set_id = 9開頭。然後，外部選擇將忽略除第二個子句以外的最低非重疊項 - 但不必再次打到真正的表。

所以，讓你：

TYPE SET_ID MIN_VALUE MAX_VALUE NEXT_SET_ID NEXT_MIN_VALUE NEXT_MAX_VALUE GRP_ID 
---- ------ ---------- ---------- ----------- -------------- -------------- ------ 
a   1   1   3   2    4    10  1 
a   1   1   3   3    6    10  2 
a   4   2   5   3    6    10  3 
b   5   1   9            4 
c   6   1   7            5 
c   7   3   10            6 
d   8   1   2   9    3    3  7

然後，您可以使用它作爲錨定構件在recursive subquery factoring clause：

with t as (
    ... 
), 
r (type, set_id, min_value, max_value, 
    next_set_id, next_min_value, next_max_value, grp_id) as (
    select t.type, t.set_id, t.min_value, t.max_value, 
    t.next_set_id, t.next_min_value, t.next_max_value, 
    row_number() over (order by t.type, t.min_value, t.next_min_value) 
    from t 
    where not exists (
    select 1 from t t2 
    where t2.type = t.type 
    and t2.next_max_value < t.next_min_value 
) 
    ...

如果你離開了r CTE與和剛剛做你」 d得到相同的七個小組。

遞歸部件然後使用從該查詢以各組的下一個成員的下set_id其範圍，並且重複外連接/不存在查找找到下一個（多個）集合再次;停車時沒有下一組不重疊：

... 
    union all 
    select r.type, r.next_set_id, r.next_min_value, r.next_max_value, 
    t.set_id, t.min_value, t.max_value, r.grp_id 
    from r 
    left join table1 t 
    on t.type = r.type 
    and t.min_value > r.next_max_value 
    and not exists (
    select 1 from table1 t2 
    where t2.type = r.type 
    and t2.min_value > r.next_max_value 
    and t2.max_value < t.min_value 
) 
    where r.next_set_id is not null -- to stop looking when you reach a leaf node 
) 
...

最後您有一個基於遞歸CTE得到你想要的列和指定的順序查詢：

... 
select r.type, r.grp_id, r.set_id 
from r 
order by r.type, r.grp_id, r.min_value;

它得到：

TYPE  GRP_ID  SET_ID 
---- ---------- ---------- 
a    1   1 
a    1   2 
a    2   1 
a    2   3 
a    3   4 
a    3   3 
b    4   5 
c    5   6 
c    6   7 
d    7   8 
d    7   9 
d    7   10 
d    7   11

SQL Fiddle demo。

如果您想要顯示每組的最小/最大值，並且可以跟蹤並顯示每組的最小/最大值。我只是顯示了問題的列，但是。

來源

2014-11-17 15:14:49

這看起來非常棒！不幸的是，經過幾個小時的努力，我意識到我的IT在版本上是錯誤的 - 真的是10g。有什麼辦法可以將它轉換爲10g，因爲遞歸子查詢因子是11gR2的新增功能？ – esa606 2014-11-17 21:17:41

SQL：組裝非重疊集合

回答

相關問題