2017-02-07 25 views
0

我發現了一個類似的問題(Duplicating records to fill gap between dates in Google BigQuery),但是具有不同的方案並且不適用該答案。用於在Google BigQuery中填充多個日期間隔的重複記錄組

我有結構化的,像這樣的數據(這基本上是爲多種產品和合作夥伴的價格變化歷史):

+------------+---------+---------+-------+ 
| date | product | partner | value | 
+------------+---------+---------+-------+ 
| 2017-01-01 | a  | x  | 10 | 
| 2017-01-01 | b  | x  | 15 | 
| 2017-01-01 | a  | y  | 11 | 
| 2017-01-01 | b  | y  | 16 | 
| 2017-01-05 | b  | x  | 13 | 
| 2017-01-07 | a  | y  | 15 | 
| 2017-01-07 | a  | x  | 15 | 
+------------+---------+---------+-------+ 

我需要的是一個查詢(專門寫BigQuery中的標準SQL),給定時間範圍(在這種情況下,2017-01-012017-01-10),輸出以下結果:

+--------------+---------+---------+-------+ 
|  date | product | partner | value | 
+--------------+---------+---------+-------+ 
| 2017-01-01 | a  | x  | 10 | 
| 2017-01-02 | a  | x  | 10 | 
| 2017-01-03 | a  | x  | 10 | 
| 2017-01-04 | a  | x  | 10 | 
| 2017-01-05 | a  | x  | 10 | 
| 2017-01-06 | a  | x  | 10 | 
| 2017-01-07 | a  | x  | 15 | 
| 2017-01-08 | a  | x  | 15 | 
| 2017-01-09 | a  | x  | 15 | 
| 2017-01-10 | a  | x  | 15 | 
| 2017-01-01 | a  | y  | 11 | 
| 2017-01-02 | a  | y  | 11 | 
| 2017-01-03 | a  | y  | 11 | 
| 2017-01-04 | a  | y  | 11 | 
| 2017-01-05 | a  | y  | 11 | 
| 2017-01-06 | a  | y  | 11 | 
| 2017-01-07 | a  | y  | 15 | 
| 2017-01-08 | a  | y  | 15 | 
| 2017-01-09 | a  | y  | 15 | 
| 2017-01-10 | a  | y  | 15 | 
| 2017-01-01 | b  | x  | 15 | 
| 2017-01-02 | b  | x  | 15 | 
| 2017-01-03 | b  | x  | 15 | 
| 2017-01-04 | b  | x  | 15 | 
| 2017-01-05 | b  | x  | 13 | 
| 2017-01-06 | b  | x  | 13 | 
| 2017-01-07 | b  | x  | 13 | 
| 2017-01-08 | b  | x  | 13 | 
| 2017-01-09 | b  | x  | 13 | 
| 2017-01-10 | b  | x  | 13 | 
| 2017-01-01 | b  | y  | 16 | 
| 2017-01-02 | b  | y  | 16 | 
| 2017-01-03 | b  | y  | 16 | 
| 2017-01-04 | b  | y  | 16 | 
| 2017-01-05 | b  | y  | 16 | 
| 2017-01-06 | b  | y  | 16 | 
| 2017-01-07 | b  | y  | 16 | 
| 2017-01-08 | b  | y  | 16 | 
| 2017-01-09 | b  | y  | 16 | 
| 2017-01-10 | b  | y  | 16 | 
+--------------+---------+---------+-------+ 

基本上是一個歷史價格與填補,對於產品和夥伴的每一個組合的所有日期的間隙。

我很難弄清楚如何做到這一點,特別是如何在沒有價格變化發生的同一日期生成多行。有任何想法嗎?

+0

你嘗試過這麼遠嗎?我不明白你的問題,你將如何結束每個日期的多行。什麼決定了? –

+0

@ElliottBrossard的前提是,我需要每天重複產品和合作夥伴的每個組合的最新價值,即價值沒有變化。因此,如果我有兩個產品和兩個合作伙伴,那麼在沒有價值變化的日子裏應該有4行最新值 –

回答

3

試試下面

#standardSQL 
WITH history AS (
    SELECT '2017-01-01' AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL 
    SELECT '2017-01-01' AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL 
    SELECT '2017-01-01' AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL 
    SELECT '2017-01-01' AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL 
    SELECT '2017-01-05' AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL 
    SELECT '2017-01-07' AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL 
    SELECT '2017-01-07' AS d, 'a' AS product, 'x' AS partner, 15 AS value 
), 
daterange AS (
    SELECT date_in_range 
    FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-10')) AS date_in_range 
), 
temp AS (
    SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d 
    FROM history 
    ORDER BY product, partner, d 
) 
SELECT date_in_range, product, partner, value 
FROM daterange 
JOIN temp 
ON daterange.date_in_range >= PARSE_DATE('%Y-%m-%d', temp.d) 
AND (daterange.date_in_range < PARSE_DATE('%Y-%m-%d', temp.next_d) OR temp.next_d IS NULL) 
ORDER BY product, partner, date_in_range