2013-12-18 75 views
1

這裏是我的問題,我有一個MySQL表具有以下的列和數據的例子:複雜重疊

id | user | starting date | ending date | activity code 
1 | Andy | 2010-04-01 | 2010-05-01 | 3 
2 | Andy | 1988-11-01 | 1991-03-01 | 3 
3 | Andy | 2005-06-01 | 2008-08-01 | 3 
4 | Andy | 2005-08-01 | 2008-11-01 | 3 
5 | Andy | 2005-06-01 | 2010-05-01 | 4 
6 | Ben | 2010-03-01 | 2011-06-01 | 3 
7 | Ben | 2010-03-01 | 2010-05-01 | 4 
8 | Ben | 2005-04-01 | 2011-05-01 | 3 

正如你可以在此表中看到用戶可以有相同的活動代碼和類似的日期週期。對於同一個用戶,期間可以重疊或不重疊。表中還可能有幾個重疊期。

我要的是一個MYSQL查詢得到以下結果:

new id | user | starting date | ending date | activity code 
1 | Andy | 2010-04-01 | 2010-05-01 | 3 => ok, no overlap period 
2 | Andy | 1988-11-01 | 1991-03-01 | 3 => ok, no overlap period 
3 | Andy | 2005-06-01 | 2008-11-01 | 3 => same user, same activity but ending date coming from row 4 as extended period 
4 | Andy | 2005-06-01 | 2010-05-01 | 4 => ok other activity code 
5 | Ben | 2005-04-01 | 2011-06-01 | 3 => ok other user, but as overlap period rows 6 and 8 for the same user and activity, I take the widest range 
6 | Ben | 2010-03-01 | 2010-05-01 | 4 => ok other activity for second user 

換句話說,對於相同的用戶和活動的代碼,如果沒有重疊,我需要的起始日期和結束日期爲他們是。如果同一用戶和活動代碼存在重疊,則需要來自不同相關行的較低開始日期和較高結束日期。我需要這個表的所有用戶和活動代碼的表和SQL中的MYSQL。

我希望它很清楚,有人可以幫助我,因爲我嘗試了本網站上提供的解決方案中的不同代碼,而其他人則沒有成功。

回答

0

我有所曲(MySQL特定嚴格)解決方案:

SET @user = NULL; 
SET @activity = NULL; 
SET @interval_id = 0; 

SELECT 
    MIN(inn.`starting date`) AS start, 
    MAX(inn.`ending date`) AS end, 
    inn.user, 
    inn.`activity code` 
    FROM 
    (SELECT 
     IF(user <> @user OR `activity code` <> @activity, 
      @interval_id := @interval_id + 1, NULL), 
     IF(user <> @user OR `activity code` <> @activity, 
      @interval_end := STR_TO_DATE('',''), NULL), 
     @user := user, 
     @activity := `activity code`, 
     @interval_id := IF(`starting date` > @interval_end, 
          @interval_id + 1, 
          @interval_id) AS interval_id, 
     @interval_end := IF(`starting date` < @interval_end, 
          GREATEST(@interval_end, `ending date`), 
          `ending date`) AS interval_end, 
     t.* 
    FROM Table1 t 
    ORDER BY t.user, t.`activity code`, t.`starting date`, t.`ending date`) inn 
GROUP BY inn.user, inn.`activity code`, inn.interval_id; 

其基本思想是無恥地從第1回答借來this question

您可以使用此SQL Fiddle來查看結果並嘗試不同的源數據。

+0

非常感謝PM,它工作的很棒。老實說,這對我來說真的太複雜了......當然,我還有很多在SQL中學習的東西。我只修改了你的代碼,以考慮當它仍然在進行時沒有結束日期('0000-00-00'作爲表中的結束日期)發生的情況。再次感謝 – user3115576

0

這裏是一個解決方案 - (見http://sqlfiddle.com/#!2/fda3d/15

SELECT DISTINCT summarized.`user` 
    , summarized.activity_code 
    , summarized.true_begin 
    , summarized.true_end 
FROM (
    SELECT t1.id,t1.`user`,t1.activity_code 
    , MIN(LEAST(t1.`starting`, COALESCE(overlap.`starting` ,t1.`starting`))) as true_begin 
    , MAX(GREATEST(t1.`ending`, COALESCE(overlap.`ending` ,t1.`ending`))) as true_end 
    FROM t1 
    LEFT JOIN t1 AS overlap 
    ON t1.`user` = overlap.`user` 
     AND t1.activity_code = overlap.activity_code 
     AND overlap.`ending` >= t1.`starting` 
     AND overlap.`starting` <= t1.`ending` 
     AND overlap.id <> t1.id 
    GROUP BY t1.id, t1.`user`, t1.activity_code) AS summarized; 

我不知道它將如何與大數據有許多重疊設置執行。您肯定需要在用戶和activity_code字段中指定索引 - 可能是起始和結束日期字段也是該索引的一部分。

+0

我已經嘗試過了,但它並沒有像PM77-1那樣給出預期的結果,但是非常感謝你的時間。 – user3115576

+0

@ user3115576 - 真的不是正確的結果?在我給你的鏈接中,我沒有返回你需要的結果集,還是有額外的數據,你有這不起作用? – AgRizzo

+0

我誠摯的歉意AgRizzo ...它的作品也。我給了一個簡單的例子,顯式列名和數據。當我嘗試應用您的代碼時,我做出了一些適應性錯誤。我已經修復了它們,它和PM代碼一樣工作...我將監視兩個解決方案之間的性能,當我的數據庫將長大。非常感謝,再次致歉 – user3115576