PostgreSQL解釋計劃中的成本測量有多可靠？

查詢在具有1100萬行的大型表上執行。在查詢執行之前，我已經在表上執行了ANALYZE。PostgreSQL解釋計劃中的成本測量有多可靠？

查詢1：

SELECT * 
FROM accounts t1 
LEFT OUTER JOIN accounts t2 
    ON (t1.account_no = t2.account_no 
     AND t1.effective_date < t2.effective_date) 
WHERE t2.account_no IS NULL;

解釋分析：

Hash Anti Join (cost=480795.57..1201111.40 rows=7369854 width=292) (actual time=29619.499..115662.111 rows=1977871 loops=1) 
    Hash Cond: ((t1.account_no)::text = (t2.account_no)::text) 
    Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text) 
    -> Seq Scan on accounts t1 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.025..25693.921 rows=11034070 loops=1) 
    -> Hash (cost=342610.81..342610.81 rows=11054781 width=146) (actual time=29612.925..29612.925 rows=11034070 loops=1) 
     Buckets: 2097152 Batches: 1 Memory Usage: 1834187kB 
     -> Seq Scan on accounts t2 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.006..22929.635 rows=11034070 loops=1) 
Total runtime: 115870.788 ms

估計費用爲〜120萬與拍攝的實際時間〜1.9分鐘。

問題2：

SELECT t1.* 
FROM accounts t1 
LEFT OUTER JOIN accounts t2 
    ON (t1.account_no = t2.account_no 
     AND t1.effective_date < t2.effective_date) 
WHERE t2.account_no IS NULL;

解釋分析：

Hash Anti Join (cost=480795.57..1201111.40 rows=7369854 width=146) (actual time=13365.808..65519.402 rows=1977871 loops=1) 
    Hash Cond: ((t1.account_no)::text = (t2.account_no)::text) 
    Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text) 
    -> Seq Scan on accounts t1 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.007..5032.778 rows=11034070 loops=1) 
    -> Hash (cost=342610.81..342610.81 rows=11054781 width=18) (actual time=13354.219..13354.219 rows=11034070 loops=1) 
     Buckets: 2097152 Batches: 1 Memory Usage: 545369kB 
     -> Seq Scan on accounts t2 (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.011..8964.571 rows=11034070 loops=1) 
Total runtime: 65705.707 ms

的費用估計爲〜120萬（再次）但所採取的實際時間是<1.1分鐘。

查詢3：

SELECT * 
FROM accounts 
WHERE (account_no, 
     effective_date) IN 
    (SELECT account_no, 
      max(effective_date) 
    FROM accounts 
    GROUP BY account_no);

解釋分析：

Nested Loop (cost=406416.19..502216.84 rows=2763695 width=146) (actual time=31779.457..917543.228 rows=1977871 loops=1) 
    -> HashAggregate (cost=406416.19..406757.45 rows=34126 width=43) (actual time=31774.877..33378.968 rows=1977425 loops=1) 
     -> Subquery Scan on "ANY_subquery" (cost=397884.72..404709.90 rows=341259 width=43) (actual time=27979.226..29841.217 rows=1977425 loops=1) 
       -> HashAggregate (cost=397884.72..401297.31 rows=341259 width=18) (actual time=27979.224..29315.346 rows=1977425 loops=1) 
        -> Seq Scan on accounts (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.851..16092.755 rows=11034070 loops=1) 
    -> Index Scan using accounts_idx2 on accounts (cost=0.00..2.78 rows=1 width=146) (actual time=0.443..0.445 rows=1 loops=1977425) 
     Index Cond: (((account_no)::text = ("ANY_subquery".account_no)::text) AND ((effective_date)::text = "ANY_subquery".max)) 
Total runtime: 918039.614 ms

估計費用爲〜502000，但實際採取的時間〜15.3分鐘！

EXPLAIN輸出的可靠程度如何？
難道一定要EXPLAIN ANALYZE看到我們的查詢是如何將真實數據執行，而不是放置在查詢規劃多少認爲將花費信任？

來源

2014-01-16 ADTC

成本是一個任意數字。成本只相對於彼此，它們沒有單位，也沒有外在意義。通過將成本估算與一系列查詢的執行時間進行比較，您可以估算從查詢成本到機器執行時間的粗略轉換因子，但這是唯一的方法。成本估算的可靠性很大程度上取決於計劃人員的工作表現如何，如何更新以及詳細瞭解您的表格統計信息，以及您是否遇到任何已知的成本估算問題（如相關列）。 –

_「您可以通過比較一系列查詢的成本估算與執行時間，估算機器的查詢成本到執行時間的粗略轉換因子」_在上述情況下，粗略轉換因子完全無用。如果我大致估計查詢1和2的時間轉換因子的成本，我認爲查詢3應該不超過45秒。 **但需要超過15分鐘？爲什麼？** – ADTC

換句話說，成本似乎非常具有誤導性。如果我信任成本，我會選擇查詢3而不是查詢2，但實際執行時間表明我應該在查詢3中選擇查詢2. – ADTC

他們是可靠的，除非他們不是。你不能一概而論。

它看起來像是大大低估了它會發現的不同account_no的數量（認爲它會發現34126實際上發現1977425）。您的default_statistics_target可能不夠高，無法爲此列獲得較好的估計值。

來源

2014-01-16 23:37:57 jjanes

這是一個很好的提示！我估計估計不好的一個線索是估計的行數與實際行數不夠接近。 – ADTC

PostgreSQL解釋計劃中的成本測量有多可靠？

回答

相關問題