2016-05-24 26 views
0

假設我有一個表與下面的架構的BigQuery:獲取行+信息有關最近的前面列,其中列x具有前一行一些價值

name  | type 
---------------------- 
id   | STRING 
timestamp | TIMESTAMP 
event_type | STRING 
some_value | STRING 
... 

我想'x'類型的所有事件。但是,我還希望爲每個返回的行添加一個附加參數。如果最近的事件WHERE event_type='y'具有some_value='necessary value',此參數是一個布爾值,應該是TRUE

例如,假設下面的行按時間戳升序排序:

event_type | some_value 
------------------------ 
y   | 'true value' 
x   | 'not relevant' 
y   | 'false value' 
x   | 'not relevant 2' 
y   | 'true value' 
y   | 'false value' 
x   | 'not relevant3' 
x   | 'not relevant4' 

我會得到下面的行回從我的查詢:

event_type | some_value  | previous_true 
------------------------------------- 
x   | 'not relevant' | TRUE 
x   | 'not relevant2' | FALSE 
x   | 'not relevant3' | FALSE 
x   | 'not relevant4' | FALSE 

我以爲加入可能做的伎倆但我無法弄清楚這將如何工作。起初,LAG也似乎是一個好主意,但後來我意識到LAG將採取前一行,不管它是什麼,我不知道我將如何使用它。

回答

2

與BigQuery標準SQL - 試試下面
確保取消Use Legacy SQL複選框下顯示選項

WITH YourTable AS (
    SELECT 1 AS ts, 'y' AS event_type, 'true value' AS some_value UNION ALL 
    SELECT 2 AS ts, 'x' AS event_type, 'not relevant' AS some_value UNION ALL 
    SELECT 3 AS ts, 'y' AS event_type, 'false value' AS some_value UNION ALL 
    SELECT 4 AS ts, 'x' AS event_type, 'not relevant2' AS some_value UNION ALL 
    SELECT 5 AS ts, 'y' AS event_type, 'true value' AS some_value UNION ALL 
    SELECT 6 AS ts, 'y' AS event_type, 'false value' AS some_value UNION ALL 
    SELECT 7 AS ts, 'x' AS event_type, 'not relevant3' AS some_value UNION ALL 
    SELECT 8 AS ts, 'x' AS event_type, 'not relevant4' AS some_value 
) 
SELECT 
    event_type, 
    some_value, 
    (SELECT some_value = 'true value' FROM YourTable 
    WHERE event_type = 'y' AND ts < a.ts 
    ORDER BY ts DESC LIMIT 1 
    ) AS previous_true 
FROM YourTable AS a 
WHERE event_type = 'x' 
ORDER BY ts 

結果是:

event_type some_value  previous_true  
x   not relevant true  
x   not relevant2 false  
x   not relevant3 false  
x   not relevant4 false  

對於傳統的BigQuery SQL - 嘗試

SELECT 
    event_type, some_value, 
    previous_true = 'true value' AS previous_true 
FROM (
    SELECT 
    ts, event_type, some_value, 
    FIRST_VALUE(some_value) OVER(PARTITION BY grp ORDER BY ts) AS previous_true 
    FROM (
    SELECT 
     ts, event_type, some_value, 
     SUM(step) OVER(ORDER BY ts) AS grp 
    FROM (
     SELECT 
     ts, event_type, some_value, 
     IF(event_type = 'x' , 0, 1) AS step 
     FROM 
     (SELECT 1 AS ts, 'y' AS event_type, 'true value' AS some_value), 
     (SELECT 2 AS ts, 'x' AS event_type, 'not relevant' AS some_value), 
     (SELECT 3 AS ts, 'y' AS event_type, 'false value' AS some_value), 
     (SELECT 4 AS ts, 'x' AS event_type, 'not relevant2' AS some_value), 
     (SELECT 5 AS ts, 'y' AS event_type, 'true value' AS some_value), 
     (SELECT 6 AS ts, 'y' AS event_type, 'false value' AS some_value), 
     (SELECT 7 AS ts, 'x' AS event_type, 'not relevant3' AS some_value), 
     (SELECT 8 AS ts, 'x' AS event_type, 'not relevant4' AS some_value) 
    ) 
) 
) 
WHERE event_type = 'x' 
ORDER BY ts 
+0

所以傳統的工作就好了。但是,當我嘗試使用非傳統版本時遇到了問題。當我爲'YourTable'替換'mydataset.table'時,我得到了'Query is not supported'錯誤。這是我能找到的錯誤信息的全部範圍。 –

+1

如果你需要這個來使用標準的sql工作,但仍然有自己的問題需要解決 - 請提交更多的細節問題(確切的查詢等) - 評論的格式不允許在這種情況下提供有效的幫助 - 我會非常樂意在那裏幫助 –

0

以下是一種方法:您可以對「y」使用最大掃描來獲取每個「x」的最近y的id。然後用join進行計算:

select t.*, 
     (case when some_value = 'necessary value' then 1 else 0 end) as previous_true 
from (select t.*, 
      max(case when event_type = 'y' then id end) over (order by timestamp) as yid 
     from t 
    ) t join 
    t ty 
    on ty.id = t.yid 
where t.event_type = 'x'; 

我不知道有關的idtimestamp的確切作用。此版本假定id相對於timestamp一律增加。或者,您可以使用timestamp - 但不清楚這是否足以滿足join

相關問題