2012-11-01 181 views
1

我有一個事件記錄,當一個設備開始或停止失敗代碼,我試圖計算失敗和開始之間的平均和平均時間。下面是數據的一個很簡單的例子表:MySQL group with with a lookahead?

+----+-----------+---------------------+ 
| id | eventName | eventTime   | 
+----+-----------+---------------------+ 
| 1 | start  | 2012-11-01 14:25:20 | 
| 2 | fail A | 2012-11-01 14:27:45 | 
| 3 | start  | 2012-11-01 14:30:49 | 
| 4 | fail B | 2012-11-01 14:32:54 | 
| 5 | start  | 2012-11-01 14:35:59 | 
| 6 | fail A | 2012-11-01 14:37:02 | 
| 7 | start  | 2012-11-01 14:38:05 | 
| 8 | fail A | 2012-11-01 14:40:09 | 
| 9 | start  | 2012-11-01 14:41:11 | 
| 10 | fail C | 2012-11-01 14:43:14 | 
+----+-----------+---------------------+ 

創建代碼:

CREATE TABLE `test` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT, 
    `eventName` varchar(50) NOT NULL, 
    `eventTime` datetime NOT NULL, 
    PRIMARY KEY (`id`) 
); 
INSERT INTO `test` (`id`, `eventName`, `eventTime`) VALUES (1,'start','2012-11-01 14:25:20'),(2,'fail A','2012-11-01 14:27:45'),(3,'start','2012-11-01 14:30:49'),(4,'fail B','2012-11-01 14:32:54'),(5,'start','2012-11-01 14:35:59'),(6,'fail A','2012-11-01 14:37:02'),(7,'start','2012-11-01 14:38:05'),(8,'fail A','2012-11-01 14:40:09'),(9,'start','2012-11-01 14:41:11'),(10,'fail C','2012-11-01 14:43:14'); 

我可以得到啓動和使用這樣的一個失敗的次數:

SET @time_prev := -1; 
SELECT 
* 
FROM 
(
    SELECT 
    eventName 
    , eventTime 
    , @ts := UNIX_TIMESTAMP(eventTime) AS ts 
    , @started := IF(eventName = 'start', 1, 0) AS started 
    , @failed := IF(eventName <> 'start', 1, 0) AS failed 
    , @time_diff := IF(@time_prev > -1, @ts - @time_prev, 0) AS time_diff 
    , @time_prev := @ts AS time_prev 
    , @time_to_fail := IF(@failed, @time_diff, 0) AS time_to_fail 
    , @time_to_start := IF(@started, @time_diff, 0) AS time_to_start 
    FROM 
    test 
) AS t1; 

+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+ 
| eventName | eventTime   | ts   | started | failed | time_diff | time_prev | time_to_fail | time_to_start | 
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+ 
| start  | 2012-11-01 14:25:20 | 1351805120 |  1 |  0 |   0 | 1351805120 | 0   | 0    | 
| fail A | 2012-11-01 14:27:45 | 1351805265 |  0 |  1 |  145 | 1351805265 | 0   | 145   | 
| start  | 2012-11-01 14:30:49 | 1351805449 |  1 |  0 |  184 | 1351805449 | 184   | 0    | 
| fail B | 2012-11-01 14:32:54 | 1351805574 |  0 |  1 |  125 | 1351805574 | 0   | 125   | 
| start  | 2012-11-01 14:35:59 | 1351805759 |  1 |  0 |  185 | 1351805759 | 185   | 0    | 
| fail A | 2012-11-01 14:37:02 | 1351805822 |  0 |  1 |  63 | 1351805822 | 0   | 63   | 
| start  | 2012-11-01 14:38:05 | 1351805885 |  1 |  0 |  63 | 1351805885 | 63   | 0    | 
| fail A | 2012-11-01 14:40:09 | 1351806009 |  0 |  1 |  124 | 1351806009 | 0   | 124   | 
| start  | 2012-11-01 14:41:11 | 1351806071 |  1 |  0 |  62 | 1351806071 | 62   | 0    | 
| fail C | 2012-11-01 14:43:14 | 1351806194 |  0 |  1 |  123 | 1351806194 | 0   | 123   | 
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+ 

但爲了在失敗和開始之間獲得時間,我必須前進到下一個記錄並丟失該失敗代碼的分組。我怎樣才能將其移動到下一個級別,並讓未來的時間開始合併到失敗的記錄中,以便將其分組?

最終,計算平均值和中位數後,我最終會設置這樣的結果:

+-----------+-------------+----------------+--------------+-----------------+ 
| eventName | avg_to_fail | median_to_fail | avg_to_start | median_to_start | 
+-----------+-------------+----------------+--------------+-----------------+ 
| fail A |  110.66 |   124.00 |  103.00 |   63.00 | 
| fail B |  125.00 |   125.00 |  185.00 |   185.00 | 
+-----------+-------------+----------------+--------------+-----------------+ 

回答

1

這使平均 中位數是SQL中的痛苦。 Simple way to calculate median with MySQL給出了一些想法。這兩個內部查詢給出了結果集的中位數以上是否存在中值聚合。

Select 
    times.eventName, 
    avg(times.timelapse) as avg_to_fail, 
    avg(times2.timelapse) as avg_to_start 
From (
    Select 
    starts.id, 
    starts.eventName, 
    TimestampDiff(SECOND, starts.eventTime, Min(ends.eventTime)) as timelapse 
    From 
    Test as starts, 
    Test as ends 
    Where 
    starts.eventName != 'start' And 
    ends.eventName = 'start' And 
    ends.eventTime > starts.eventTime 
    Group By 
    starts.id 
) as times2 
    Right Outer Join (
    Select 
    starts.id, 
    ends.eventName, 
    TimestampDiff(SECOND, starts.eventTime, Min(ends.eventTime)) as timelapse 
    From 
    Test as starts, 
    Test as ends 
    Where 
    starts.eventName = 'start' And 
    ends.eventName != 'start' And 
    ends.eventTime > starts.eventTime 
    Group By 
    starts.id 
) as times 
    On times2.EventName = times.EventName 
Group By 
    Times.eventName 

爲了幫助理解我會首先考慮

Select 
    starts.id, 
    ends.eventName, 
    starts.eventTime, 
    ends.eventTime 
From 
    Test as starts, 
    Test as ends 
Where 
    starts.eventName = 'start' And 
    ends.eventName != 'start' And 
    ends.eventTime > starts.eventTime 

這是內部查詢times而不受組和分鐘發言的精髓。你會看到這有一行將每個開始事件與結束事件在開始事件之後的每個結束事件組合在一起。調用此X.

接下來的部分是

Select 
    X.startid, 
    X.endeventname, 
    TimestampDiff(SECOND, X.starttime, Min(x.endTime)) as timelapse 
From 
    X 
Group By 
    X.startid 

這裏的關鍵是民(x.endTime)同組的結合。所以我們得到了開始時間之後的最早結束時間(因爲X已經限制它在之後)。雖然我只挑選了需要使用的列,但我們可以在這裏訪問開始時間標識,結束時間標識開始事件,結束事件,開始時間,分鐘(結束時間)。你可以用它來找到avg_to_start的原因是因爲我們選擇了有趣的事件名稱,因爲我們都有。

SQL小提琴:http://sqlfiddle.com/#!2/90465/6

+0

我刪除位數從標題,這不是問題。問題是根據下一行數據計算第二個avg/median。 – fwrawx

+0

@fwrawx - 我已經根據您的規範更新它,以提供avg_to_fail。適應avg_to_start很容易。然後,您可以完全外連接EventName上的兩個結果集。 – Laurence

+0

* _to_fail是很容易的部分,獲取* _to_start和合並是困難的部分,因爲1)eventName對於所有記錄是相同的,並且2)時間是從前一記錄計算的 – fwrawx