2014-12-19 52 views
5

我試圖使用LAG函數計算BigQuery中的28天移動總和。用於28天滑動窗口聚合的BigQuery SQL(無需編寫28行SQL)

頂端回答這個問題

Bigquery SQL for sliding window aggregate

從費利佩霍法表示,你可以使用LAG功能。一個例子是:

SELECT 
    spend + spend_lagged_1day + spend_lagged_2day + spend_lagged_3day + ... + spend_lagged_27day as spend_28_day_sum, 
    user, 
    date 
FROM (
    SELECT spend, 
     LAG(spend, 1) OVER (PARTITION BY user ORDER BY date) spend_lagged_1day, 
     LAG(spend, 2) OVER (PARTITION BY user ORDER BY date) spend_lagged_2day, 
     LAG(spend, 3) OVER (PARTITION BY user ORDER BY date) spend_lagged_3day, 
     ... 
     LAG(spend, 28) OVER (PARTITION BY user ORDER BY date) spend_lagged_day, 
     user, 
     date 
    FROM user_spend 
) 

有沒有辦法做到這一點,而不必寫出28行的SQL!

回答

21

BigQuery文檔不能很好地解釋該工具支持的窗口函數的複雜性,因爲它沒有指定在ROWS或RANGE之後可以出現哪些表達式。它實際上支持窗口函數的SQL 2003標準,您可以在網絡上找到其他地方的文檔,例如here

這意味着您可以通過單個窗口功能獲得所需的效果。範圍是27,因爲它是在當前的行數之前包括在總和中的行數。

SELECT spend, 
     SUM(spend) OVER (PARTITION BY user ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW), 
     user, 
     date 
FROM user_spend; 

範圍界限也是非常有用的。如果您的表缺少某個用戶的日期,那麼27個PRECEDING行將返回超過27天,但RANGE將根據日期值本身生成一個窗口。在以下查詢中,日期字段是BigQuery TIMESTAMP,範圍以微秒爲單位指定。我建議,無論您在BigQuery中如何計算數據,都要對其進行徹底測試,以確保它能給您預期的答案。

SELECT spend, 
     SUM(spend) OVER (PARTITION BY user ORDER BY date RANGE BETWEEN 27 * 24 * 60 * 60 * 1000000 PRECEDING AND CURRENT ROW), 
     user, 
     date 
FROM user_spend;