2016-11-30 21 views
1

類似的質疑BigQuery combine tables based on closest timerstamp and matching value的BigQuery - 結合基於匹配值或時間戳

我有三個表和表numberTwo的每一行,我需要得到提示的Numberone有三個表相同值和那些之間,比較TIME1TIME2當具有最接近時間之一。如果鱈魚沒有在表中給出的Numberone它試圖得到提示匹配鱈魚 numberThree

爲了更容易明白我需要做的是:

表的Numberone:

| id | cod | hint |   time1   | 
--------------------------------------------------- 
| 1 | ABC | V | 2016-11-03 18:00:00 UTC | 
| 2 | ABC | W | 2016-11-03 12:00:00 UTC | 
| 3 | CDE | X | 2016-11-03 19:00:00 UTC | 
| 4 | CDE | Y | 2016-11-03 19:30:00 UTC | 
| 5 | EFG | Z | 2016-11-03 18:00:00 UTC | 

表numberTwo

| id | cod | value |   time2   | 
---------------------------------------------------- 
| 1 | ABC | xyz2 | 2016-11-03 18:20:00 UTC | 
| 2 | FHK | h323 | 2016-11-03 11:30:00 UTC | 
| 3 | ABC | rewq | 2016-11-03 09:00:00 UTC | 
| 4 | IJK | abce | 2016-11-03 19:10:00 UTC | 

表numberThree

| id | cod | hint | 
-------------------------- 
| 1 | FHK | tes1 | 
| 2 | IJK | tes2 | 
| 3 | MNK | tes3 | 
| 4 | MOP | tes4 | 

所以,列#1表numberTwo我會得到表的Numberone所有行與鱈魚:ABC

| 1 | ABC | V | 2016-11-03 18:00:00 UTC | 
| 2 | ABC | W | 2016-11-03 12:00:00 UTC | 

而這些我會得到時間2了一個與最接近的時間戳之間:

| 1 | ABC | V | 2016-11-03 18:00:00 UTC | 

如果鱈魚在表中沒有給出的Numberone這對錶numberThree匹配。代碼numberOnenumberThree是唯一的。所以不存在兩個表格中都會出現相同的代碼的情況。所以它可以嘗試首先匹配表numberThree

所需的表

| id | cod | hint | value |   time2   | 
-------------------------------------------------------------- 
| 1 | ABC | V | xyz2 | 2016-11-03 18:20:00 UTC | 
| 2 | FHK | tes1 | h323 |       | 
| 3 | ABC | W | rewq | 2016-11-03 09:00:00 UTC | 
| 4 | IJK | tes2 | abce |       | 
+0

什麼是架構numberThree:

處理每一行我也有一個這樣的表格之後?什麼是預期的輸出模式?請+簡單的例子! –

+0

@MikhailBerlyant我已經更新了這個問題。 –

+0

做了這項工作? –

回答

1

試試下面

WITH 
/* 
TableNumberOne AS (
    SELECT 1 AS id, 'ABC' AS cod, 'V' AS hint, TIMESTAMP '2016-11-03 18:00:00 UTC' AS time1 UNION ALL 
    SELECT 2 AS id, 'ABC' AS cod, 'W' AS hint, TIMESTAMP '2016-11-03 12:00:00 UTC' AS time1 UNION ALL 
    SELECT 3 AS id, 'CDE' AS cod, 'X' AS hint, TIMESTAMP '2016-11-03 19:00:00 UTC' AS time1 UNION ALL 
    SELECT 4 AS id, 'CDE' AS cod, 'Y' AS hint, TIMESTAMP '2016-11-03 19:30:00 UTC' AS time1 UNION ALL 
    SELECT 5 AS id, 'EFG' AS cod, 'Z' AS hint, TIMESTAMP '2016-11-03 18:00:00 UTC' AS time1 
), 
TableNumberTwo AS (
    SELECT 1 AS id, 'ABC' AS cod, 'xyz2' AS value, TIMESTAMP '2016-11-03 18:20:00 UTC' AS time2 UNION ALL 
    SELECT 2 AS id, 'FHK' AS cod, 'h323' AS value, TIMESTAMP '2016-11-03 11:30:00 UTC' AS time2 UNION ALL 
    SELECT 3 AS id, 'ABC' AS cod, 'rewq' AS value, TIMESTAMP '2016-11-03 09:00:00 UTC' AS time2 UNION ALL 
    SELECT 4 AS id, 'IJK' AS cod, 'abce' AS value, TIMESTAMP '2016-11-03 19:10:00 UTC' AS time2 
), 
TableNumberThree AS (
    SELECT 1 AS id, 'FHK' AS cod, 'test1' AS hint UNION ALL 
    SELECT 2 AS id, 'IJK' AS cod, 'test2' AS hint UNION ALL 
    SELECT 3 AS id, 'MNK' AS cod, 'test3' AS hint UNION ALL 
    SELECT 4 AS id, 'MOP' AS cod, 'test4' AS hint 
), 
*/ 
tempTable AS (
    SELECT 
    t2.id, t2.cod, t2.value, t2.time2, t1.hint, 
    ROW_NUMBER() OVER(PARTITION BY t2.id, t2.cod, t2.value 
         ORDER BY ABS(TIMESTAMP_DIFF(t2.time2, t1.time1, SECOND))) AS win 
    FROM TableNumberTwo AS t2 
    LEFT JOIN TableNumberOne AS t1 
    ON t1.cod = t2.cod 
) 
SELECT 
    t1.id, t1.cod, IFNULL(t1.hint, t2.hint) AS hint, value, 
    IF(t1.hint IS NULL, NULL, time2) as time2 
FROM tempTable AS t1 
LEFT JOIN TableNumberThree AS t2 
ON t1.cod = t2.cod AND t1.hint IS NULL 
WHERE win = 1