2016-03-21 42 views
3

我有一個表中的BigQuery,看起來像這樣:使用差距和羣島查找小時連續/ Dates- SQL/BigQuery的

Caller_Number | month | day| call_time 
--------------|--------|-----|---------- 
1    | 5  | 15 | 12:56:17 

我想寫的BigQuery的SQL查詢,讓我來算至少進行一次呼叫的連續小時數(按呼叫者號碼排序)以及發生呼叫的至少連續10個小時的連續時間(按呼叫者號碼分類)。我一直在尋找差距和島嶼上的現有資源,但似乎無法弄清楚如何將它應用到連續的日期和時間。

+0

「按caller_number排序」使上述問題非常模糊。你應該提供更多的細節。你可能想分享預期結果的例子。沒有「通過caller_number排序」或具有「通過caller_number分區」將使其完全清楚目前存在的故事 –

+0

好吧,抽樣結果是這樣的: Caller_Number |月| Day | Num_Consec_Hours 第二個樣本結果是這樣的: Caller_Number |月| Num_Consec_Days – argunaw

回答

2

下面是工作示例連續小時
步驟是
1.「提取」從call_time

HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) 

2.Find前一小時組的

LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour]) 

3.Calculate啓動小時連續小時 - 1 - 啓動,0 - 組延續

IFNULL(INTEGER([hour] - prev_hour > 1), 1) 

4.Assign組號碼各組

SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour]) 

最後,即指否極泰來 - 逐組號和計數通話和小時

希望這給你良好的開端,爲實現連續幾天上類似的邏輯聖地亞哥小時結果的頂部

SELECT Caller_Number, [month], [day], seq_group, 
    EXACT_COUNT_DISTINCT([hour]) AS hours_count, COUNT(1) AS calls_count 
FROM (
    SELECT Caller_Number, [month], [day], [hour], 
    SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] 
        ORDER BY [hour]) AS seq_group 
    FROM (
    SELECT Caller_Number, [month], [day], [hour], 
     IFNULL(INTEGER([hour] - prev_hour > 1), 1) AS seq 
    FROM (
     SELECT Caller_Number, [month], [day], [hour], 
     LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] 
         ORDER BY [hour]) AS prev_hour 
     FROM (
     SELECT Caller_Number, [month], [day], 
      HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) AS [hour] 
     FROM YourTable 
    ) 
    ) 
) 
) 
GROUP BY Caller_Number, [month], [day], seq_group