我想你將不得不使用某種遞歸,而不管你選擇哪種語言。這裏是你如何能做到這一點的SQL的例子:
DECLARE @ TABLE (ID INT, start_date DATE, end_date DATE);
INSERT @ VALUES (1,'2012-03-15','2012-04-02')
, (1,'2012-04-05','2012-05-12')
, (1, '2012-04-12', '2012-05-21')
, (2, '2012-03-05', '2012-06-13')
, (3, '2012-03-09', '2012-03-19')
, (3, '2012-04-03', '2012-05-02')
, (3, '2012-05-01', '2012-08-01')
, (3, '2012-05-16', '2012-08-02')
, (3, '2012-06-08', '2012-09-09');
WITH T AS (
SELECT id, start_date, end_date, RN
FROM (
SELECT id, start_date, end_date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date) RN
FROM @) S
WHERE RN = 1
UNION ALL
SELECT S.id
, CASE WHEN DATEDIFF(dd, T.start_date, S.start_date) <= 31 THEN T.start_date ELSE S.start_date END
, CASE WHEN DATEDIFF(dd, T.start_date, S.start_date) <= 31 THEN T.end_date ELSE S.end_date END
, S.RN
FROM (
SELECT id, start_date, end_date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date) RN
FROM @) S
JOIN T ON T.id = S.id
WHERE S.RN = T.RN+1
)
SELECT id, start_date, end_date
FROM T
GROUP BY id, start_date, end_date;
這對於正常工作未大型不惜一切樣本的大小,但如果你正在尋找在很多行,它可能不是這是最有效的方法。
您是否可以爲每個ID重複start_dates? (如果是這樣,你如何決定哪個是重複的?) – ZLK
不會。對於同一個start_date和相同的id,不會有重複。 – breezymri
你可以使用最少的代碼在python中蠻力,遍歷所有行,如果日期<31天到前一行和id是相同的刪除行。這很容易編碼。然後它取決於性能,python中的迭代可能會很慢。 – flyingmeatball