2017-01-12 80 views
0

比方說,我有一個表activities,其中的字段爲starttime (TIMESTAMP)stoptime (TIMESTAMP)。我想找到一個活動發生最多的時刻。查詢應該首先返回這樣的時刻。Google BigQuery - 基於時間間隔的最活躍時刻

我試圖讓所有starttime時間戳,然後爲他們每個人算那些在那一刻發生的活動的數量。然後,找到最大:

#standardSQL 
SELECT 
    time, 
    (
    SELECT COUNT(*) 
    FROM activities 
    WHERE starttime <= time AND time <= stoptime 
) AS cnt 
FROM (
    SELECT DISTINCT starttime AS time 
    FROM activities 
    ORDER BY time 
) 
ORDER BY cnt DESC, time ASC 
LIMIT 1 

不幸的是,它說:LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

我認爲對於這個外界數據庫世界的一個適當的算法將是讓所有starttimesstoptimes把它們放在一個陣列中的一種方式,他們將被區分,對它進行排序,然後依次由該陣列尋找最大時刻。但是,我不知道如何在SQL中表達這樣的算法。我看過this但我不認爲它有任何幫助。

+0

什麼是你的時刻的粒度 - 是它第二,分鐘還是小時或其他什麼? –

+0

@MikhailBerlyant我認爲這是毫秒。 –

+0

因此,您需要在整個時間段內找到您最精確的毫秒數?請確認,因爲這聽起來並不適用於大多數使用情況的實際,但你可能有一些特殊的情況下 –

回答

2

我已經實現了一些接近我在問題中描述的算法。它工作得很快,但如果你找到更好的東西,我會很高興看到它。

#standardSQL 
SELECT time, SUM(add) OVER(ORDER BY time ASC, add DESC) AS cumsum 
FROM (
    SELECT starttime AS time, 1 AS add 
    FROM activities UNION ALL 
    SELECT stoptime AS time, -1 AS add 
    FROM activities 
) 
ORDER BY cumsum DESC 
1

考慮下面的版本
從我的角度來看,它返回更實際的輸出 - 這是 - 同一級別的連續活動的各個階段(對應開頭和結尾)
所以你現在不只是開始但是具有最高活動的整個時期(開始和結束)。不只是一個,但他們都

#standardSQL 
WITH intervals AS (
    SELECT time AS start_, LEAD(time) OVER(ORDER BY time) AS end_ 
    FROM (
    SELECT DISTINCT time FROM (
     SELECT starttime AS time FROM activities UNION ALL 
     SELECT stoptime AS time FROM activities)) 
), 
equals AS (
    SELECT start_, end_, COUNT(1) AS cumsum 
    FROM intervals AS i 
    JOIN activities AS a 
    ON i.start_ >= a.starttime AND i.end_ <= a.stoptime 
    GROUP BY start_, end_ 
), 
grps AS (
    SELECT 
    start_, end_, cumsum, 
    IFNULL(
     CAST(end_ = LEAD(start_) OVER(ORDER BY start_) AND LEAD(cumsum) OVER(ORDER BY start_) = cumsum AS INT64), 
     CAST(NOT((start_ = LAG(end_) OVER(ORDER BY start_) AND LAG(cumsum) OVER(ORDER BY start_) = cumsum)) AS INT64) 
    ) AS flag 
    FROM equals 
) 
SELECT MIN(start_) AS start_, MAX(end_) AS end_, cumsum 
FROM (
    SELECT start_, end_, cumsum, SUM(flag) OVER(ORDER BY start_) AS grp 
    FROM grps 
) 
GROUP BY cumsum, grp 
ORDER BY start_ 

你可以用上面使用虛擬活動表玩

WITH activities AS (
    SELECT 1 AS starttime, 3 AS stoptime UNION ALL 
    SELECT 1 AS starttime, 4 AS stoptime UNION ALL 
    SELECT 4 AS starttime, 5 AS stoptime UNION ALL 
    SELECT 7 AS starttime, 8 AS stoptime UNION ALL 
    SELECT 7 AS starttime, 10 AS stoptime UNION ALL 
    SELECT 8 AS starttime, 12 AS stoptime 
) 

WITH activities AS (
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE) AS stoptime UNION ALL 
    SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 MINUTE) AS stoptime 
)