0
我有一個subscription
表和一個payments
表,我需要加入。 我試圖在2個選項之間做出決定,性能是一個關鍵考慮因素。我應該在連接條件還是先前的CTE中放置行號過濾器?
以下兩個選項中哪一個表現更好?
我正在使用Impala,並且這些表很大(數百萬行)我只需要爲每個id
和date
分組(因此爲row_number()
分析函數)獲得一行。
我已經縮短了的查詢來說明我的問題:
OPTION 1:
WITH cte
AS (
SELECT *
, SUM(amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
),
payment
AS (
SELECT *
FROM cte
WHERE sameday_rownum = 1
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
選項2:
WITH payment
AS (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
AND p.sameday_rownum = 1
只要將條件放在'on'子句中即可。無需使用兩個CTE混淆查詢。 –
謝謝。因此,考慮到它是內連接,所以沒有任何性能影響?我想知道這是否類似於連接條件過濾的性能與最終SQL語句的SQL謂詞中的where子句過濾的性能? – cdabel
您應該能夠通過查看查詢計劃來查看優化程序是要在開始還是結束時應用篩選器。 – Connor