2014-08-27 29 views
2
select 
CASE 
    WHEN ..... 
    ELSE ..... 
END AS carrier, 
count(vehicle_id) as cnt 
from test.vehicle_info 
WHERE vehicle_id NOT IN(select hardware_id 
         from TABLE_DATE_RANGE(test.gps32_,DATE_ADD(CURRENT_TIMESTAMP(), -6,  'DAY'),DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))) 
group by carrier 
order by cnt 

錯誤,我得到這個錯誤:「過大加入」當我不使用JOIN

Query Failed 
Error: Table too large for JOIN. Consider using JOIN EACH. For more details, please see https://developers.google.com/bigquery/docs/query-reference#joins 
Job ID: red-road-574:job_e2o6sBjO9Dt5QrU_cRM2VHSRTso 

是什麼原因,如何解決呢?

+2

我猜想'WHERE ... NOT IN(SELECT ...)'在後臺變成一個'LEFT JOIN' +一個'IS NULL'條件。 – hobbs 2014-08-27 14:46:42

回答

2

@霍布斯的猜測是正確的。 SEMIJOIN(使用WHERE ... IN ...)和ANTIJOIN(使用WHERE ... NOT IN ...)被實現爲JOIN操作。解決這些限制的方法是使用加入EACH來自己重寫爲連接。那就是:

select 
CASE 
    WHEN ..... 
    ELSE ..... 
END AS carrier, 
count(vi.vehicle_id) as cnt 
from test.vehicle_info vi 
LEFT OUTER JOIN EACH (select hardware_id FROM TABLE_DATE_RANGE(...)) hi 
ON vi.vechicle_id = hi.hardware_id 
WHERE hi.hardware_id is NULL 
group by carrier 
order by cnt