2016-09-20 109 views
2

我想在BigQuery中連接三個表;表1具有一個事件的記錄(即,每行是一條記錄),表2具有第二事件的記錄,並且表3具有類別名稱。三個表的BigQuery連接

我要產生具有按類別和設備平臺的表1和表2計數決賽桌。但是,每次運行時,我都會收到一個錯誤消息,說明joined.t3.category不是加入中任一表的字段。

這裏是我當前的代碼:

Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category 

from 

(

SELECT 
     Id, 
     Origin, 
     CatId 

    FROM [testing.table_1] as t1 

JOIN (SELECT category, 
      CategoryID 

FROM [testing.table_3]) as t3 

on t1.CatId = t3.CategoryID 

) AS joined 

JOIN (SELECT Id, 
      CategoryId 

FROM [testing.table_2]) as t2 

ON (joined.t1.CatId = t2.CategoryId)  

Group by platform,category; 

僅供參考,這裏的表1和表2完美的作品之間的簡單連接:

Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform 

from testing.table_1 as t1 

JOIN testing.table_2 as t2 

on t1.CatId = t2.CategoryId 

Group by platform; 

回答

1

簡單的解決方法是添加在第一內SELECTcategory場 - 否則它是不可見的,最外面的SELECT - 這樣的錯誤!這是問題!

此外,在BigQuery中傳統的SQL可以使用EXACT_COUNT_DISTINCT否則你得到的統計逼近 - 看到更多COUNT([DISTINCT])

因此,對於傳統的SQL查詢可以關注一下:

SELECT 
    EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events, 
    EXACT_COUNT_DISTINCT(t2.Id) AS t2_events, 
    joined.t1.Origin AS platform, 
    joined.t3.category AS category 
FROM (
    SELECT 
    Id, Origin, CatId, category 
    FROM [testing.table_1] AS t1 
    JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category 

而且,我覺得就像你可以進一步簡化它(假設沒有任何含糊的字段)

SELECT 
    EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events, 
    EXACT_COUNT_DISTINCT(t2.Id) AS t2_events, 
    joined.t1.Origin AS platform, 
    joined.t3.category AS category 
FROM (
    SELECT 
    Id, Origin, CatId, category 
    FROM [testing.table_1] AS t1 
    JOIN [testing.table_3] AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN [testing.table_2] AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category 

當然你需要如果您將使用標準SQL版本(如Elliott所示:

SELECT 
    COUNT(DISTINCT joined.t1.Id) AS t1_events, 
    COUNT(DISTINCT t2.Id) AS t2_events, 
    joined.t1.Origin AS platform, 
    joined.t3.category AS category 
FROM (
    SELECT 
    Id, Origin, CatId, category 
    FROM `testing.table_1` AS t1 
    JOIN `testing.table_3` AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN `testing.table_2` AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category 
+0

您是真正的MVP--這項工作非常完美。 –

0

我不知道谷歌與BigQuery的,但我的SQL知識說我在列名之前有兩個別名會導致問題。嘗試刪除之後的t-別名,例如使用joined.category而不是joined.t3.category

1

你可以嘗試使用standard SQL您所查詢的呢?它具有更好的別名處理能力,並且COUNT(DISTINCT ...)將爲您提供精確的結果,而不是像傳統SQL中的近似值。如果有幫助,你需要對查詢進行的唯一修改是使用反引號來轉義你的表名而不是括號。例如:

SELECT 
    COUNT(DISTINCT joined.t1.Id) as t1_events, 
    COUNT(DISTINCT t2.Id) as t2_events, 
    joined.t1.Origin as platform, 
    joined.t3.category as category 
FROM (
    SELECT 
    Id, 
    Origin, 
    CatId 
    FROM `testing.table_1` AS t1 
    JOIN (
    SELECT 
     category, 
     CategoryID 
    FROM `testing.table_3` 
) AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN (
    SELECT 
    Id, 
    CategoryId 
    FROM `testing.table_2` 
) AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category;