我正在對Google BigQuery中的publicdata:samples.github_timeline數據集進行漏斗分析。我想按時間順序提取所有執行一系列三個事件的獨特用戶。將GoogleBoundQuery中的獨立行分組需要更長的時間
的事件和順序:
- WatchEvent
- PushEvent
- CreateEvent
這是查詢:
select user from (
SELECT user1 as user,
ts1 as eventDate1,
ts2 as eventDate2,
IF(ts2 < ts3, ts3, NULL) as eventDate3
FROM
(SELECT user1,
ts1,
ts2,
ts3
FROM (SELECT user1,
ts1,
IF(ts1 < ts2, ts2, NULL) as ts2
FROM
(SELECT user1,
ts1,
ts2
FROM (SELECT repository_owner as user1,
created_at as ts1
FROM [publicdata:samples.github_timeline]
WHERE type = "WatchEvent") as step1
LEFT JOIN EACH (SELECT repository_owner as user2,
created_at as ts2
FROM [publicdata:samples.github_timeline]
WHERE type = "PushEvent") as step2
ON user1 = user2 where ts1 is not NULL)
) as steps1_2
LEFT JOIN (SELECT repository_owner as user3,
created_at as ts3
FROM [publicdata:samples.github_timeline]
WHERE type = "CreateEvent") as step3
ON user1 = user3
where ts2 is not NULL
)
)
where eventDate3 is not null
group by user
limit 100
沒有GROUP BY用戶在結束它非常快(10秒)。但是當我添加它時,完成需要很多時間(超過20分鐘)。
查詢有什麼問題? 您可以在這裏測試查詢:https://bigquery.cloud.google.com/
我試過無覆蓋超過極限100,花了23分鐘。 –