2012-10-29 100 views
3

我試圖找出一種比較SQL Server 2008中兩行的有效方法。我需要編寫一個查詢,查找Movement表中連續N次有Speed < 10的所有行。使用上一行進行比較?

表的結構是:

EVENTTIME 速度

如果數據是:

2012-02-05 13:56:36.980, 2 
2012-02-05 13:57:36.980, 11 
2012-02-05 13:57:46.980, 2 
2012-02-05 13:59:36.980, 2 
2012-02-05 14:06:36.980, 22 
2012-02-05 15:56:36.980, 2 

然後,它會返回行3/4(13:57:46.980/13 :59:36.980)如果我查找了2個連續的行,並且如果我查找了三個連續的行,它將不會返回任何內容。數據的順序僅爲EventTime/DateTime。

你可以給我的任何幫助將是偉大的。我正在考慮使用遊標,但它們通常效率很低。而且,這個表格大約有10m行,所以效率越高越好! :)

謝謝!

+0

難道這些數據被一些其他領域的組合在一起?比如'thingy_id',還是這兩列真的只有兩列? – MatBailie

+0

@Dems - 還有其他列將幫助將結果集限制在10k/100k行 - 但我所尋找的解決方案不能比上面列出的基本示例進一步分組。沒有什麼能夠幫助你了,它可以根據事件時間和速度字段識別下一個/上一行。 – Faraday

回答

5
DECLARE 
    @n    INT, 
    @speed_limit INT 
SELECT 
    @n    = 5, 
    @speed_limit = 10 

;WITH 
    partitioned AS 
(
    SELECT 
    *, 
    CASE WHEN speed < @speed_limit THEN 1 ELSE 0 END AS PartitionID 
    FROM 
    Movement 
) 
, 
    sequenced AS 
(
    SELECT 
    ROW_NUMBER() OVER (      ORDER BY EventTime) AS MasterSeqID, 
    ROW_NUMBER() OVER (PARTITION BY PartitionID ORDER BY EventTime) AS PartIDSeqID, 
    * 
    FROM 
    partitioned 
) 
, 
    filter AS 
(
    SELECT 
    MasterSeqID - PartIDSeqID AS GroupID, 
    MIN(MasterSeqID)    AS GroupFirstMastSeqID, 
    MAX(MasterSeqID)    AS GroupFinalMastSeqID 
    FROM 
    sequenced 
    WHERE 
    PartitionID = 1 
    GROUP BY 
    MasterSeqID - PartIDSeqID 
    HAVING 
    COUNT(*) >= @n 
) 
SELECT 
    sequenced.* 
FROM 
    filter 
INNER JOIN 
    sequenced 
    ON sequenced.MasterSeqID >= filter.GroupFirstMastSeqID 
    AND sequenced.MasterSeqID <= filter.GroupFinalMastSeqID 

替代最後步驟(由@叔克勞森-DK啓發),以避免額外的JOIN。我會測試兩個看看哪個更高性能。

, 
    filter AS 
(
    SELECT 
    MasterSeqID - PartIDSeqID        AS GroupID, 
    COUNT(*) OVER (PARTITION BY MasterSeqID - PartIDSeqID) AS GroupSize, 
    * 
    FROM 
    sequenced 
    WHERE 
    PartitionID = 1 
) 
SELECT 
    * 
FROM 
    filter 
WHERE 
    GroupSize >= @n 
+0

是否有可能獲得序列中第一個/最後一個記錄之間的時間?因此,如果我的@n = 2,那麼它會返回第一行和最後一行,或者返回第一行的總時間直到最後一行? – Faraday

+0

@vijay - 是的,你在這裏有很多選擇。在最後一步中,只需在'filter'步驟中創建的'GroupID'字段進行分組,然後使用'MIN(EventTime)'和'MAX(EventTime)'。或者,只需將'MIN()'和'MAX()'計算添加到'filter'步驟本身。 – MatBailie

+0

@Dems我認爲你對新代碼有很好的想法。但有一些對不在那裏的列的引用(按順序) –

3
declare @t table(EventTime datetime, Speed int) 
insert @t values('2012-02-05 13:56:36.980', 2) 
insert @t values('2012-02-05 13:57:36.980', 11) 
insert @t values('2012-02-05 13:57:46.980', 2) 
insert @t values('2012-02-05 13:59:36.980', 2) 
insert @t values('2012-02-05 14:06:36.980', 22) 
insert @t values('2012-02-05 15:56:36.980', 2) 

declare @N int = 1 

;with a as 
(
    select EventTime, Speed, row_number() over (order by EventTime) rn from @t 
), b as 
(
    select EventTime, Speed, 1 grp, rn from a where rn = 1 
    union all 
    select a.EventTime, a.Speed, case when a.speed < 10 and b.speed < 10 then grp else grp + 1 end, a.rn 
    from a join b on a.rn = b.rn+1 
), c as 
(
    select EventTime, Speed, count(*) over (partition by grp) cnt from b 
) 
select * from c 
where cnt > @N 
OPTION (MAXRECURSION 0) -- Thx Dems 
+0

+1:雖然,對於10m記錄,您可能需要指定'OPTION(MAXRECURSION 0):) :) – MatBailie

+0

@Dems - You + 1d此解決方案,您更喜歡哪種解決方案? – Faraday

+0

@Vijay - 我喜歡'COUNT(*)OVER(PARTITION BY GRP)'而不是我的'filter'步驟。但我不喜歡這裏使用的遞歸步驟,這是由於遞歸步驟的數據量巨大且具有高度順序性。 – MatBailie

3

幾乎相同IDEEA作爲民主黨,有一點不同:

select * from (
select eventtime, speed, rnk, new_rnk, 
     rnk - new_rnk, 
     max(rnk) over (partition by speed, new_rnk-rnk) - 
     min(rnk) over (partition by speed, new_rnk-rnk) + 1 as no_consec 
    from (
    select eventtime, rnk, speed, 
      row_number() over (partition by speed order by eventtime) as new_rnk 
    from (
      select eventtime, speed, 
      row_number() over (order by eventtime) as rnk 
      from a 
     ) a 
    where a.speed < 5 
) 
order by eventtime 
) 
where no_consec >= 2; 

5是限速和2是分鐘數目的連續事件。 爲了簡化寫入創建數據庫,我將日期設置爲數字。

SQLFIDDLE

編輯:

要回答的意見,我在第一個內部查詢添加三列。要獲得第一行,您需要將WHERE子句添加一個pos_in_group = 1,距離在您的手指上。

SQLFIDDLE

select eventtime, speed, min_date, max_date, pos_in_group 

from (
    select eventtime, speed, rnk, new_rnk, 
     rnk - new_rnk, 
     row_number() over (partition by speed, new_rnk-rnk order by eventtime) pos_in_group, 
     min(eventtime) over (partition by speed, new_rnk-rnk) min_date, 
     max(eventtime) over (partition by speed, new_rnk-rnk) max_date, 
     max(rnk) over (partition by speed, new_rnk-rnk) - 
     min(rnk) over (partition by speed, new_rnk-rnk) + 1 as no_consec 
    from (
    select eventtime, rnk, speed, 
      row_number() over (partition by speed order by eventtime) as new_rnk 
    from (
      select eventtime, speed, 
      row_number() over (order by eventtime) as rnk 
      from a 
     ) a 
    where a.speed < 5 
    ) 
    order by eventtime 
) 
where no_consec > 1; 
+0

是否有可能只返回序列中的第一行,所以如果no_consec = 2,那麼它返回第一行,這是真的,並在第一行和最後一行之間的EventTime的差異? – Faraday