2017-04-10 80 views
0

給定一個間隔表,我可以有效地查詢每個間隔開始時的當前打開間隔數(包括當前間隔本身)嗎?BigQuery中重疊間隔的數量

例如,下表給出:

 
start_time end_time 
     1  10 
     2  5 
     3  4 
     5  6 
     7  11 
     19  20 

我想下面的輸出:

 
start_time count 
     1  1 
     2  2 
     3  3 
     5  3 
     7  2 
     19  1 

在小數據集,我可以對自己加入該數據集解決這個問題:

WITH intervals AS (
    SELECT 1 AS start, 10 AS end UNION ALL 
    SELECT 2, 5 UNION ALL 
    SELECT 3, 4 UNION ALL 
    SELECT 5, 6 UNION ALL 
    SELECT 7, 11 UNION ALL 
    SELECT 19, 20 
) 
SELECT 
    a.start_time, 
    count(*) 
FROM 
    intervals a CROSS JOIN intervals b 
WHERE 
    a.start_time >= b.start_time AND 
    a.start_time <= b.end_time 
GROUP BY a.start_time 
ORDER BY a.start_time 

對於大型數據集,CROSS JOIN既不切實際也不必要,因爲ny給出答案只取決於少數前面的區間(當按start_time排序時)。事實上,在我擁有的數據集中,它超時。有沒有更好的方法來實現這一目標?

+0

u能解釋輸出虛擬數據? – Teja

+0

輸出是從輸入開始的每個時間間隔的開始時間以及在該時間間隔的開始時間的開放時間間隔(開始時間<=那個時間和結束時間> =那個時間)的計數。 –

回答

1

... CROSS JOIN既不切實際也不必要...
有沒有更好的方法來實現這個目標?

請嘗試下面的BigQuery標準SQL。沒有加入參與

#standardSQL 
SELECT 
    start_time, 
    (SELECT COUNT(1) FROM UNNEST(ends) AS e WHERE e >= start_time) AS cnt 
FROM (
    SELECT 
    start_time, 
    ARRAY_AGG(end_time) OVER(ORDER BY start_time) AS ends 
    FROM intervals 
) 
-- ORDER BY start_time 

您可以測試/使用下面的例子發揮它從你的問題

#standardSQL 
WITH intervals AS (
    SELECT 1 AS start_time, 10 AS end_time UNION ALL 
    SELECT 2, 5 UNION ALL 
    SELECT 3, 4 UNION ALL 
    SELECT 5, 6 UNION ALL 
    SELECT 7, 11 UNION ALL 
    SELECT 19, 20 
) 
SELECT 
    start_time, 
    (SELECT COUNT(1) FROM UNNEST(ends) AS e WHERE e >= start_time) AS cnt 
FROM (
    SELECT 
    start_time, 
    ARRAY_AGG(end_time) OVER(ORDER BY start_time) AS ends 
    FROM intervals 
) 
-- ORDER BY start_time 
+0

@BrandonDuRette - 你有機會嘗試嗎? –