2013-07-16 142 views
1
SELECT COUNT(*) 
FROM "businesses" 
WHERE (businesses.postal_code_id IN 
     (SELECT id 
      FROM postal_codes 
      WHERE lower(city) IN ('los angeles') 
      AND lower(region) = 'california')) 
    AND (EXISTS 
     (SELECT * 
      FROM categorizations c 
      WHERE c.business_id=businesses.id 
      AND c.category_id IN (86))) 

我有一個postgres數據庫的業務,類別和位置。這個查詢花了95665.9ms執行,我很確定瓶頸在於分類。有沒有更好的方法來執行此操作?將得到的讀數爲1032如何優化此Postgresql計數查詢?

=# EXPLAIN ANALYZE SELECT COUNT(*) 
-# FROM "businesses" 
-# WHERE (businesses.postal_code_id IN 
(#   (SELECT id 
(#   FROM postal_codes 
(#   WHERE lower(city) IN ('los angeles') 
(#    AND lower(region) = 'california')); 
                      QUERY PLAN                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Aggregate (cost=4007.74..4007.75 rows=1 width=0) (actual time=263820.923..263820.924 rows=1 loops=1) 
    -> Nested Loop (cost=41.93..4005.20 rows=1015 width=0) (actual time=469.716..263679.865 rows=112513 loops=1) 
     -> HashAggregate (cost=15.59..15.60 rows=1 width=4) (actual time=332.664..332.946 rows=82 loops=1) 
       -> Bitmap Heap Scan on postal_codes (cost=11.57..15.59 rows=1 width=4) (actual time=84.772..332.407 rows=82 loops=1) 
        Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text)) 
        -> BitmapAnd (cost=11.57..11.57 rows=1 width=0) (actual time=77.530..77.530 rows=0 loops=1) 
          -> Bitmap Index Scan on idx_postal_codes_lower_city (cost=0.00..5.66 rows=187 width=0) (actual time=22.800..22.800 rows=82 loops=1) 
           Index Cond: (lower((city)::text) = 'los angeles'::text) 
          -> Bitmap Index Scan on idx_postal_codes_lower_region (cost=0.00..5.66 rows=187 width=0) (actual time=54.714..54.714 rows=2356 loops=1) 
           Index Cond: (lower((region)::text) = 'california'::text) 
     -> Bitmap Heap Scan on businesses (cost=26.34..3976.91 rows=1015 width=4) (actual time=95.926..3208.426 rows=1372 loops=82) 
       Recheck Cond: (postal_code_id = postal_codes.id) 
       -> Bitmap Index Scan on index_businesses_on_postal_code_id (cost=0.00..26.08 rows=1015 width=0) (actual time=89.864..89.864 rows=1380 loops=82) 
        Index Cond: (postal_code_id = postal_codes.id) 
Total runtime: 263821.016 ms 
(15 rows) 

和連接的版本:

EXPLAIN ANALYZE SELECT count(*) FROM businesses 
LEFT JOIN postal_codes 
ON businesses.postal_code_id = postal_codes.id 
WHERE lower(postal_codes.city) = 'los angeles' 
AND lower(postal_codes.region) = 'california'; 

-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN | Aggregate (cost=4006.14..4006.15 rows=1 width=0) (actual time=143357.170..143357.171 rows=1 loops=1) 
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN | -> Nested Loop (cost=37.91..4005.19 rows=381 width=0) (actual time=138.666..143218.064 rows=112514 loops=1) 
-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |   -> Bitmap Heap Scan on postal_codes (cost=11.57..15.59 rows=1 width=4) (actual time=0.559..33.957 rows=82 loops=1) 
-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |    Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text)) 
-[ RECORD 5 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |    -> BitmapAnd (cost=11.57..11.57 rows=1 width=0) (actual time=0.532..0.532 rows=0 loops=1) 
-[ RECORD 6 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |      -> Bitmap Index Scan on idx_postal_codes_lower_city (cost=0.00..5.66 rows=187 width=0) (actual time=0.058..0.058 rows=82 loops=1) 
-[ RECORD 7 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |       Index Cond: (lower((city)::text) = 'los angeles'::text) 
-[ RECORD 8 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |      -> Bitmap Index Scan on idx_postal_codes_lower_region (cost=0.00..5.66 rows=187 width=0) (actual time=0.461..0.461 rows=2356 loops=1) 
-[ RECORD 9 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |       Index Cond: (lower((region)::text) = 'california'::text) 
-[ RECORD 10 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |   -> Bitmap Heap Scan on businesses (cost=26.34..3976.91 rows=1015 width=4) (actual time=55.493..1742.407 rows=1372 loops=82) 
-[ RECORD 11 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |    Recheck Cond: (postal_code_id = postal_codes.id) 
-[ RECORD 12 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |    -> Bitmap Index Scan on index_businesses_on_postal_code_id (cost=0.00..26.09 rows=1015 width=0) (actual time=53.141..53.141 rows=1381 loops=82) 
-[ RECORD 13 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN |      Index Cond: (postal_code_id = postal_codes.id) 
-[ RECORD 14 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------- 
QUERY PLAN | Total runtime: 143357.260 ms 

結果是簡化查詢大得多但由於有指標,只有一個我做的加盟,我驚訝需要很長時間

+0

是的,這真的很奇怪,所以這個cheep查詢運行時間太長。你可以a)在某些開發機器上安裝源代碼的pg,並嘗試編譯時沒有進行調試和分析支持的優化 - 嘗試獲取低級配置文件。 b)嘗試懲罰嵌套循環,c)嘗試檢查您的服務器 - CPU速度,IO速度,Postgres配置 - 使用此成本的查詢應在1-2秒內進行評估 –

回答

2

嘗試通過列城市使用功能指標

 
CREATE INDEX ON postal_codes((lower(city))) 

列城市和地區之間存在很強的依賴關係,所以有時您必須將這些預測分開以提高規劃者預測的準確性。如果你需要更好的預測,那麼你需要將列lower_city和lower_region添加到表postal_codes - PostgreSQL沒有統計索引。

發送執行計劃在這裏 - 通過http://explain.depesz.com/ - 如果可能的結果EXPLAIN ANALYZE YOUR_QUERY

9.1應該翻譯相關子查詢自動半連接,但我不知道。嘗試從子查詢重寫你的查詢到INNER JOIN只有窗體(可能沒有幫助,但也許)。

+0

功能索引不起作用,聯接也不起作用,不幸。我並不確定我還能做什麼來優化這個功能,超越硬件或非規範化。我簡化了查詢並更新了後面的說明, – wachutu

+0

必須使用LEFT JOIN?可能INNER JOIN應該足夠了 –