2012-08-16 58 views
1

我一直在努力解決一個實際上應該很簡單的問題,但經過整整一週的閱讀,搜索,實驗等等,我的同事和我們找不到合適的解決方案。 :(島嶼和空白tsql

的問題:我們有兩個值的表: 的employeenumber(P_ID,INT)< ---員工 日期的標識(開始時間,日期時間)< ---全職員工在

檢查
  • 我們需要知道每個員工已經工作了什麼階段。
  • 當兩個日期是小於@gap天外,他們屬於同一時期
  • 對於每個員工可以有多個記錄對於任何給定但我只需要知道哪些日期他工作,我對時間不感興趣部分
  • 一旦有差距> @ gap天,下一個日期被認爲是新範圍的開始
  • 範圍至少爲1天(例如: 21-9-2011 | 21-09-2011)但沒有最大長度。(每隔@gap檢查一名員工 - 1天應該導致從他入住的第一天直到今天的一段時間)

我們認爲我們需要的是這張表中的天數差距更大的島嶼比@variable(@gap = 30意味着30天)

所以一個例子:

sourceTable會

----- P_ID ---- | ---- starttime-- -
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00

現在我需要的結果是:

期全

---- P_ID ---- | ----啓動---- | ---- ---- ---- ---- ---- ---- ----- ---- ----- 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (今天)OR 13-08-2012 < - (小於@gap天前)或(表中的最後一次日期)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 2012-06-01

我希望這是明確的這種方式,我已經感謝您閱讀爲止這一點,那將是巨大的,如果你能做出貢獻!

+0

適用於上述結果集的'@ gap'的值是多少? – 2012-08-16 09:35:44

+0

你的結果集沒有意義。你能解釋一下12345的結果集條目嗎? – 2012-08-16 10:08:40

+0

我不認爲12345(應該是4行)或45454(應該是1行)的結果集是非常正確的。 – 2012-08-16 10:21:40

回答

0

喬恩最明確地告訴我們正確的方向。雖然性能很糟糕(數據庫中有400萬條記錄)。看起來我們錯過了一些信息。通過我們從您那裏學到的所有知識,我們提出了以下解決方案。它使用所有建議答案的元素並在3個臨時表中循環,然後最終噴出結果,但性能足夠好,以及它生成的數據。

declare @gap int 
declare @Employee_id int 

set @gap = 30 
set dateformat dmy 
--------------------------------------------------------------- #temp1 -------------------------------------------------- 
CREATE TABLE #temp1 (EmployeeID int, starttime date) 
INSERT INTO #temp1 (EmployeeID, starttime) 

select distinct ck.Employee_id, 
       cast(ck.starttime as date) 
from SERVER1.DB1.dbo.checkins pd 
     inner join SERVER1.DB1.dbo.Team t on ck.team_id = t.id 
where t.productive = 1 

--------------------------------------------------------------- #temp2 -------------------------------------------------- 

create table #temp2 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, FIRSTCHECKIN datetime) 
INSERT INTO #temp2 

select Row_number() OVER (partition by EmployeeID ORDER BY t.prev) + 1 as ROWNR, 
      EmployeeID, 
      DATEADD(DAY, 1, t.Prev) AS start_gap, 
      DATEADD(DAY, 0, t.next) AS end_gap 
from 
      (
        select a.EmployeeID, 
            a.starttime as Prev, 
            (
            select min(b.starttime) 
            from #temp1 as b 
            where starttime > a.starttime and b.EmployeeID = a.EmployeeID 
           ) as Next 
from #temp1 as a) as t 

where datediff(day, prev, next) > 30 
group by  EmployeeID, 
        t.Prev, 
        t.next 
union -- add first known date for Employee 

select  1 as ROWNR, 
      EmployeeID, 
      NULL, 
      min(starttime) 
from #temp1 ct 
group by ct.EmployeeID 

--------------------------------------------------------------- #temp3 -------------------------------------------------- 

create table #temp3 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, STARTOFCHECKIN datetime) 
INSERT INTO #temp3 

select ROWNR, 
     Employeeid, 
     ENDOFCHECKIN, 
     FIRSTCHECKIN 
from #temp2 

union -- add last known date for Employee 

select  (select count(*) from #temp2 b where Employeeid = ct.Employeeid)+1 as ROWNR, 
      ct.Employeeid, 
      (select dateadd(d,1,max(starttime)) from #temp1 c where Employeeid = ct.Employeeid), 
      NULL 
from #temp2 ct 
group by ct.EmployeeID 

---------------------------------------finally check our data------------------------------------------------- 


select    a1.Employeeid, 
        a1.STARTOFCHECKIN as STARTOFCHECKIN, 
        ENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN a1.ENDOFCHECKIN ELSE b1.ENDOFCHECKIN END, 
        year(a1.STARTOFCHECKIN) as JaarSTARTOFCHECKIN, 
        JaarENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN year(a1.ENDOFCHECKIN) ELSE year(b1.ENDOFCHECKIN) END, 
        Month(a1.STARTOFCHECKIN) as MaandSTARTOFCHECKIN, 
        MaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN month(a1.ENDOFCHECKIN) ELSE month(b1.ENDOFCHECKIN) END, 
        (year(a1.STARTOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) as JaarMaandSTARTOFCHECKIN, 
        JaarMaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN (year(a1.ENDOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) ELSE (year(b1.ENDOFCHECKIN)*100)+month(b1.ENDOFCHECKIN) END, 
        datediff(M,a1.STARTOFCHECKIN,b1.ENDOFCHECKIN) as MONTHSCHECKEDIN 
from #temp3 a1 
     full outer join #temp3 b1 on a1.ROWNR = b1.ROWNR -1 and a1.Employeeid = b1.Employeeid 
where not (a1.STARTOFCHECKIN is null AND b1.ENDOFCHECKIN is null) 
order by a1.Employeeid, a1.STARTOFCHECKIN 
1

我已經做了一個粗略的腳本,應該讓你開始。沒有費心提煉日期時間,端點比較可能需要調整。

select 
    P_ID, 
    src.starttime, 
    endtime = case when src.starttime <> lst.starttime or lst.starttime < DATEADD(dd,-1 * @gap,GETDATE()) then lst.starttime else GETDATE() end, 
    frst.starttime, 
    lst.starttime 
from @SOURCETABLE src 
outer apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > DATEADD(dd,-1 * @gap,src.starttime)) frst 
outer apply (select starttime = MAX(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and src.starttime > DATEADD(dd,-1 * @gap,sub.starttime)) lst 
where src.starttime = frst.starttime 
order by P_ID, src.starttime 

我得到下面的輸出,這是你的痘痘不同,但我認爲它的確定:

P_ID  starttime    endtime     starttime    starttime 
----------- ----------------------- ----------------------- ----------------------- ----------------------- 
12121  2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 
12345  2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 
12345  2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 
12345  2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 
12345  2012-08-13 13:00:00.000 2012-08-16 11:23:25.787 2012-08-13 13:00:00.000 2012-08-13 13:00:00.000 
45454  2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 
98765  2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 

最後兩個輸出的cols是outer apply部分的結果,而只是那裏進行調試。

這是基於以下設置:

declare @gap int 
set @gap = 30 

set dateformat dmy 
-----P_ID----|----starttime---- 
declare @SOURCETABLE table (P_ID int, starttime datetime) 
insert @SourceTable values 
(12121,'24-03-2009 7:30'), 
(12121,'24-03-2009 14:25'), 
(12345,'27-06-2011 10:00'), 
(12345,'27-06-2011 10:30'), 
(12345,'28-06-2011 11:00'), 
(98765,'13-04-2012 10:00'), 
(12345,'21-07-2011 9:00'), 
(12345,'21-09-2011 12:00'), 
(45454,'12-07-2010 8:00'), 
(12345,'21-09-2011 17:00'), 
(98765,'26-04-2012 16:00'), 
(12345,'07-07-2012 14:00'), 
(12345,'13-08-2012 13:00') 

UPDATE:輕微的反思。現在使用CTE從每個項目向前和向後工作存在的差距,然後彙總這些:

--Get the gap between each starttime and the next and prev (use 999 to indicate non-closed intervals) 
;WITH CTE_Gaps As ( 
    select 
     p_id, 
     src.starttime, 
     nextgap = coalesce(DATEDIFF(dd,src.starttime,nxt.starttime),999), --Gap to the next entry 
     prevgap = coalesce(DATEDIFF(dd,prv.starttime,src.starttime),999), --Gap to the previous entry 
     isold = case when DATEDIFF(dd,src.starttime,getdate()) > @gap then 1 else 0 end --Is starttime more than gap days ago? 
    from 
     @SOURCETABLE src 
     cross apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > src.starttime) nxt 
     cross apply (select starttime = max(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime < src.starttime) prv 
) 
--select * from CTE_Gaps 
select 
     p_id, 
     starttime = min(gap.starttime), 
     endtime = nxt.starttime 
    from 
     CTE_Gaps gap 
     --Find the next starttime where its gap to the next > @gap 
     cross apply (select starttime = MIN(sub.starttime) from CTE_Gaps sub where gap.p_id = sub.p_id and sub.starttime >= gap.starttime and sub.nextgap > @gap) nxt 
group by P_ID, nxt.starttime 
order by P_ID, nxt.starttime 
+0

喬恩,這個代碼做我們正在尋找的東西,除了一件事情.....當沒有gapps大於指定的時間段,我們會得到錯誤的結果。我會舉一個例子:自2009年2月9日起,John一直與我們合作,並且從未離開過10天以上。當我們運行這個腳本時,他會顯示一段時間:startdate是他第一次登錄的那一天,enddate是(startdate + @gap),而不是他今天或最後一次登錄的日期......所以當沒有gapps比@gap大,顯示的日期總是startdate,(startdate + @gap),而不是(startdate,今天)。如何補償? – Henrov 2012-08-16 13:27:07

+1

將此用例包含在問題中的示例數據中,並顯示所需的輸出。 – 2012-08-16 13:27:44

+0

謝謝Jon!我們已經添加了P_ID 99999. – Henrov 2012-08-16 13:43:07