2014-10-20 139 views
1

我已經在一個表中的下列數據(MS SQL Server 2012中):SQL時間戳差異

cinderellaID statusName    timestamp 
------------ ------------------------- ----------------------- 
10459  Waiting     2013-03-16 12:03:17.000 
10459  Paired     2013-03-16 12:29:50.000 
10459  Shopping     2013-03-16 12:29:22.233 
10459  Checked Out    2013-03-16 14:01:24.000 
10461  Alterations    1988-01-02 01:47:07.000 
10461  Checked Out    2013-03-16 14:42:25.000 
10461  Paired     2013-03-16 12:29:31.000 
10461  Shopping     2013-03-16 12:29:01.437 
10461  Waiting     2013-03-16 11:52:18.000 
10462  Waiting     2013-03-16 12:19:35.000 
10462  Shopping     2013-03-16 12:59:01.197 
10462  Paired     2013-03-16 12:59:28.000 
10462  Checked Out    2013-03-16 14:35:44.000 
10463  Checked Out    2013-03-16 12:22:20.000 
10463  Waiting     2013-03-16 10:44:14.000 
10463  Paired     2013-03-16 11:00:37.000 
10463  Shopping     2013-03-16 11:00:23.063 
10464  Waiting     2013-03-16 10:44:03.000 
10464  Paired     2013-03-16 10:59:32.000 
10464  Shopping     2013-03-16 10:59:02.560 
10464  Alterations    1988-01-02 00:44:02.000 
10464  Checked Out    2013-03-16 13:18:21.000 
10465  Checked Out    2013-03-16 11:54:34.000 
10465  Waiting     2013-03-16 09:44:13.000 
10465  Paired     2013-03-16 10:08:05.000 
10465  Shopping     2013-03-16 10:10:58.323 
10466  Waiting     2013-03-16 12:13:51.000 
10466  Shopping     2013-03-16 12:46:56.207 
10466  Paired     2013-03-16 12:46:43.000 
10467  Shopping     2013-03-16 10:08:06.553 
10467  Paired     2013-03-16 10:04:49.000 
10467  Waiting     2013-03-16 09:41:03.000 
<much more data ...> 

此數據呈現由cinderellaID有序,但是這只是爲了使這個問題更容易理解。

這些交易顯示一個人(由cinderellaID標識)進入每個狀態的時間。例如,在第一行中,灰姑娘10459在2013-03-16 12:03:17000進入「等待」階段。數據中總是有流量(或者應該是)。等待總是過渡到配對,配對購物,購物到檢出或改變。如果它去購物 - >改變,那麼它會去改變 - >檢出。我知道並非所有的數據都被捕獲,但對我來說沒關係。

我想要的是一種計算每個階段花費的平均時間的方法。例如,每個人在轉移到「配對」之前花了多長時間「等待」?在去「購物」之前,每個人花費在「配對」的時間多久?所以,我的輸出在理想情況下看起來像(我所做的數據了):

status  avgTimeSpent 
------------- ----------------- 
Waiting  1:00:04 
Paired  0:20:22 
Shopping  1:30:11 
... 

我熟悉的分組和我稱之爲「普通老式SQL」這樣的,但我不熟悉以及如何做這種行操作,我認爲我需要做的是爲了解決這個問題。任何幫助?

回答

1

像這樣的東西應該工作:

SELECT 
    t1.cinderellaID, 
    t1.statusName, 
    AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime 
FROM  YourTable As t1 
INNER JOIN YourTable As t2 
    ON t1.cinderellaID = t2.cinderellaID 
    AND t1.timestamp < t2.timestamp 
    AND NOT EXISTS(Select * From YourTable As t3 
        Where t3.cinderellaID = t1.cinderellaID 
        And t3.timestamp < t2.timestamp 
        And t3.timestamp > t1.timestamp) 
GROUP BY t1.cinderellaID, t1.statusName 

這個查詢應該在SQL的任何版本。有一個更高效的查詢使用ROW_NUMBER() OVER(..)函數,但不是所有類型的SQL都支持該查詢。

我看你有沒有SQL-Server的2012標籤,它不支持此功能,所以在這裏它是:

;WITH cte As 
(
    SELECT *, 
     ROW_NUMBER() OVER(
         PARTITION BY cinderellaID, statusName 
         ORDER BY timestamp) As rowNum 
    FROM YourTable 
) 
SELECT 
    t1.cinderellaID, 
    t1.statusName, 
    AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime 
FROM  cte As t1 
INNER JOIN cte As t2 
    ON t1.cinderellaID = t2.cinderellaID 
    AND t1.timestamp < t2.timestamp 
    AND t1.rowNum = t2.rowNum-1 
GROUP BY t1.cinderellaID, t1.statusName 
+0

哎呀,OK,我seedn你有SQL_SERVER-2012標籤.. – RBarryYoung 2014-10-20 14:49:18

+1

根據數據類型'timestamp'我認爲這會在減法('datetime2')或AVG('datetime') – 2014-10-20 14:55:42

+0

@Damien_The_Unbeliever是的問題上遇到問題,對吧。我應該像Gordon一樣使用DATEDIFF。現在糾正了。 – RBarryYoung 2014-10-20 15:03:25

1

你可以做你想要使用lead()什麼。基本的查詢來獲取你需要的信息是:

select t.*, 
     lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname, 
     lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp 
from singletable t; 

然後拿到場均數據:

select statusname, next_statusname, 
     avg(datediff(second, timestamp, next_timestamp)) as avg_seconds 
from (select t.*, 
      lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname, 
      lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp 
     from singletable t 
    ) t 
group by statusname, next_statusname; 
+0

+1:另一個好方法。我總是忘記他們在SQL Server 2012中添加了「LEAD()OVER(..)」。 – RBarryYoung 2014-10-20 14:57:54