2012-10-06 118 views
2

任何人都可以告訴我如何計算bigquery中的移動平均數。Bigquery移動平均數

這是我需要在mysql風格。

SELECT T1.id, T1.value_column1, avg(T2.value_column1) 
FROM table1 T1 
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id 

回答

2

對於一個更新,更有效的答案,https://stackoverflow.com/a/24943950/132438


檢查新的LAG()和LEAD()窗口函數。它們允許您遍歷結果集,而不需要自加入。

https://developers.google.com/bigquery/docs/query-reference#windowfunctions

不同的選項與JOIN EACH(這可以得到太慢作爲極大量的數據可以在中間步驟獲取生成):

SELECT a.SensorId SensorId, a.Timestamp, AVG(b.Data) AS avg_prev_hour_load 
FROM (
    SELECT * FROM [io_sensor_data.moscone_io13] 
    WHERE SensorId = 'XBee_40670EB0/mic') a 
JOIN EACH [io_sensor_data.moscone_io13] b 
ON a.SensorId = b.SensorId 
WHERE b.Timestamp BETWEEN (a.Timestamp - 3600000) AND a.Timestamp 
GROUP BY SensorId, a.Timestamp; 

(基於喬Celko的SQL

問題)

+0

有關更新和更有效的答案,請參閱http://stackoverflow.com/a/24943950/132438。 –

4

您可以做同樣的事情,但由於BigQuery只允許加入平等,所以需要多一點工作。下面是一個將從公共天賦樣本中計算6個月出生體重移動平均值的例子。

SELECT 
    --Convert months-since-year-0 back to year, month 
    INTEGER(month/12) as year, 
    month % 12 as month, 
    avg 
FROM (
    SELECT month, 
    -- Note that this average is the average over all of the data in the 
    -- last 6 months, not an average over the avg values for the last 6 months. 
    -- It is easy to compute the latter, if that is what is desired -- just 
    -- compute the average in the inner select, and take the average of those 
    -- here. 
    SUM(total_weight_per_month)/SUM(records_per_month) as avg 
    FROM (
    SELECT 
     -- Note we use t2.month here since that is what is compared against 
     -- 6 different t1 months. 
     t2.month as month, 
     t1.records_per_month as records_per_month,  
     t1.total_weight_per_month as total_weight_per_month 
    FROM (
     SELECT month, 
     COUNT(weight_pounds) as records_per_month, 
     SUM(weight_pounds) as total_weight_per_month, 
     -- This active field is the key that lets us join all of the 
     -- values against the values in the date subselect. 
     1 AS active 
     FROM (
     SELECT 
      -- Convert year and month fields to a single value that 
      -- has the number of months since year 0. This will allow 
      -- us to do math on the dates. 
      year * 12 + month AS month, 
      weight_pounds 
     FROM [publicdata:samples.natality] 
     WHERE weight_pounds > 0) 
     GROUP BY month) as t1 
    JOIN 
     -- We join the weights per month agsint a subselect that contains 
     -- all months. 
     (SELECT month, 1 as active 
     FROM 
     (SELECT 
      year * 12 + month AS month, 
     FROM [publicdata:samples.natality]) 
     GROUP BY month) as t2 
    ON t1.active = t2.active 
    -- Here is where we get the moving average -- we basically take the month 
    -- value from t1 and make it apply for 6 months. 
    WHERE t1.month >= t2.month && t1.month - 6 < t2.month) 
    GROUP BY month 
    ORDER BY month desc) 
+0

謝謝你的回答。這個答案很有用,解決了我的問題。 –

+1

有關更新和更高效的答案,請參閱http://stackoverflow.com/a/24943950/132438。 –

0

我已經創建了下面的 「泰晤士報」 表:

Table Details: Dim_Periods 
Schema 
Date TIMESTAMP 
Year INTEGER   
Month INTEGER   
day   INTEGER   
QUARTER INTEGER  
DAYOFWEEK INTEGER  
MonthStart TIMESTAMP 
MonthEnd TIMESTAMP 
WeekStart TIMESTAMP 
WeekEnd TIMESTAMP 
Back30Days TIMESTAMP -- the date 30 days before "Date" 
Back7Days TIMESTAMP -- the date 7 days before "Date" 

我用這樣的查詢處理「運行總和」

SELECT Date,Count(*) as MovingCNT 
FROM 

(SELECT Date, 
       Back7Days 
        FROM DWH.Dim_Periods 
       where Date < timestamp(current_date()) AND 
          Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month')) 
       )P 
       CROSS JOIN EACH 
    (SELECT repository_url,repository_created_at 
    FROM publicdata:samples.github_timeline 
       ) L 
     WHERE timestamp(repository_created_at)>= Back7Days 
       AND timestamp(repository_created_at)<= Date 

GROUP EACH BY Date 

注意,它可以被用於「月初至今」,周以日期」爲‘30天回’等聚合爲好。 但是,性能不是最好的,並且由於笛卡爾連接,查詢可能會花費較長時間在較大的數據集上。 希望這會有所幫助