2012-04-19 39 views
3

我正在運行Geodjango/Postgres 9.1/PostGIS,我試圖讓以下查詢(和其他人喜歡它)運行得更快。更快地創建Postgres查詢。更多索引?

[查詢剪斷,爲簡潔]

SELECT "crowdbreaks_incomingkeyword"."keyword_id" 
     , COUNT("crowdbreaks_incomingkeyword"."keyword_id") AS "cnt" 
    FROM "crowdbreaks_incomingkeyword" 
INNER JOIN "crowdbreaks_tweet" 
     ON ("crowdbreaks_incomingkeyword"."tweet_id" 
      = "crowdbreaks_tweet"."tweet_id") 
    LEFT OUTER JOIN "crowdbreaks_place" 
    ON ("crowdbreaks_tweet"."place_id" 
     = "crowdbreaks_place"."place_id") 
WHERE (("crowdbreaks_tweet"."coordinates" 
     @ ST_GeomFromEWKB(E'\\001 ... \\000\\000\\000\\[email protected]'::bytea) 
     OR ST_Overlaps("crowdbreaks_place"."bounding_box" 
        , ST_GeomFromEWKB(E'\\001...00\\000\\[email protected]'::bytea) 
     )) 
    AND "crowdbreaks_tweet"."created_at" > E'2012-04-17 15:46:12.109893' 
    AND "crowdbreaks_tweet"."created_at" < E'2012-04-18 15:46:12.109899') 
GROUP BY "crowdbreaks_incomingkeyword"."keyword_id" 
     , "crowdbreaks_incomingkeyword"."keyword_id" 
    ; 

這裏是crowdbreaks_tweet表的樣子:

\d+ crowdbreaks_tweet; 
         Table "public.crowdbreaks_tweet" 
    Column  |   Type   | Modifiers | Storage | Description 
---------------+--------------------------+-----------+----------+------------- 
tweet_id  | bigint     | not null | plain | 
tweeter  | bigint     | not null | plain | 
text   | text      | not null | extended | 
created_at | timestamp with time zone | not null | plain | 
country_code | character varying(3)  |   | extended | 
place_id  | character varying(32) |   | extended | 
coordinates | geometry     |   | main  | 
Indexes: 
    "crowdbreaks_tweet_pkey" PRIMARY KEY, btree (tweet_id) 
    "crowdbreaks_tweet_coordinates_id" gist (coordinates) 
    "crowdbreaks_tweet_created_at" btree (created_at) 
    "crowdbreaks_tweet_place_id" btree (place_id) 
    "crowdbreaks_tweet_place_id_like" btree (place_id varchar_pattern_ops) 
Check constraints: 
    "enforce_dims_coordinates" CHECK (st_ndims(coordinates) = 2) 
    "enforce_geotype_coordinates" CHECK (geometrytype(coordinates) = 'POINT'::text OR coordinates IS NULL) 
    "enforce_srid_coordinates" CHECK (st_srid(coordinates) = 4326) 
Foreign-key constraints: 
    "crowdbreaks_tweet_place_id_fkey" FOREIGN KEY (place_id) REFERENCES crowdbreaks_place(place_id) DEFERRABLE INITIALLY DEFERRED 
Referenced by: 
    TABLE "crowdbreaks_incomingkeyword" CONSTRAINT "crowdbreaks_incomingkeyword_tweet_id_fkey" FOREIGN KEY (tweet_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "crowdbreaks_tweetanswer" CONSTRAINT "crowdbreaks_tweetanswer_tweet_id_id_fkey" FOREIGN KEY (tweet_id_id) REFERENCES crowdbreaks_tweet(tweet_id) DEFERRABLE INITIALLY DEFERRED 
Has OIDs: no 

這裏是解釋分析查詢:

HashAggregate (cost=184022.03..184023.18 rows=115 width=4) (actual time=6381.707..6381.769 rows=62 loops=1) 
    -> Hash Join (cost=103857.48..183600.24 rows=84357 width=4) (actual time=1745.449..6377.505 rows=3453 loops=1) 
     Hash Cond: (crowdbreaks_incomingkeyword.tweet_id = crowdbreaks_tweet.tweet_id) 
     -> Seq Scan on crowdbreaks_incomingkeyword (cost=0.00..36873.97 rows=2252597 width=12) (actual time=0.008..2136.839 rows=2252597 loops=1) 
     -> Hash (cost=102535.68..102535.68 rows=80544 width=8) (actual time=1744.815..1744.815 rows=3091 loops=1) 
       Buckets: 4096 Batches: 4 Memory Usage: 32kB 
       -> Hash Left Join (cost=16574.93..102535.68 rows=80544 width=8) (actual time=112.551..1740.651 rows=3091 loops=1) 
        Hash Cond: ((crowdbreaks_tweet.place_id)::text = (crowdbreaks_place.place_id)::text) 
        Filter: ((crowdbreaks_tweet.coordinates @ '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) OR ((crowdbreaks_place.bounding_box && '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry) AND _st_overlaps(crowdbreaks_place.bounding_box, '0103000020E61000000100000005000000AE47E17A141E5FC00000000000003840AE47E17A141E5FC029ED0DBE30B14840A4703D0AD7A350C029ED0DBE30B14840A4703D0AD7A350C00000000000003840AE47E17A141E5FC00000000000003840'::geometry))) 
        -> Bitmap Heap Scan on crowdbreaks_tweet (cost=15874.18..67060.28 rows=747873 width=125) (actual time=96.012..940.462 rows=736784 loops=1) 
          Recheck Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone)) 
          -> Bitmap Index Scan on crowdbreaks_tweet_crreated_at (cost=0.00..15687.22 rows=747873 width=0) (actual time=94.259..94.259 rows=736784 loops=1) 
           Index Cond: ((created_at > '2012-04-17 15:46:12.109893+00'::timestamp with time zone) AND (created_at < '2012-04-18 15:46:12.109899+00'::timestamp with time zone)) 
        -> Hash (cost=217.11..217.11 rows=6611 width=469) (actual time=15.926..15.926 rows=6611 loops=1) 
          Buckets: 1024 Batches: 4 Memory Usage: 259kB 
          -> Seq Scan on crowdbreaks_place (cost=0.00..217.11 rows=6611 width=469) (actual time=0.005..6.908 rows=6611 loops=1) 
Total runtime: 6381.903 ms 
(17 rows) 

這對於查詢來說是相當糟糕的運行時間。理想情況下,我希望在一兩秒內得到結果。

我已經增加了Postgres的shared_buffers到2GB(我有8GB的RAM),但除此之外,我不知道該怎麼辦。我有什麼選擇?我應該減少聯接嗎?我還有其他的指標嗎? crowdbreaks_incomingkeyword的順序掃描對我來說沒有意義。它是其他表的外鍵表,因此它有索引。

+0

如果我'設置啓用seqscan關閉';'在運行解釋分析之前,查詢時間縮短爲1.8秒。然而,我讀的所有內容都說我不應該這樣做。 – Khandelwal 2012-04-19 16:23:22

+0

我在這裏粘貼了您的EXPLAIN輸出:http://explain.depesz.com/s/EfH以獲得更好的視圖。 – 2012-04-19 16:43:18

+0

BTW:在GROUP BY子句中複製術語'crowdbreaks_incomingkeyword「。」keyword_id'會產生什麼樣的影響? (可能是優化器忘記刪除冗餘,並將組的特殊性計算爲1 /平方(組數) – wildplasser 2012-04-19 22:10:54

回答

5

從您的評論來看我將要做兩件事情:

  • 升起statistics target所涉及的列(並運行ANALYZE)。

    ALTER TABLE tbl ALTER COLUMN column SET STATISTICS 1000; 
    

數據分佈可以是不均勻的。更大的樣本可以爲查詢計劃者提供更準確的估計。

  • cost settingspostgresql.conf。與索引掃描相比,您的順序掃描可能需要更高的費用才能提供良好的估計值。

盡力降低成本cpu_index_tuple_cost並設置effective_cache_size到高達你的總RAM三種quaters東西專用的DB服務器。

+0

我是否對用於篩選的列,我選擇的列或兩者提出統計信息? – Khandelwal 2012-04-19 16:43:03

+0

@ Khandelwal:你篩選和加入的人與計劃相關,只是選擇無關緊要 – 2012-04-19 16:44:49

+0

提高統計數據似乎沒有什麼區別,我在crowdbreaks_tweet.created_at和crowdbreaks_incomingkeyword.tweet_id上設置了1000的統計數據,這並沒有改變計劃 – Khandelwal 2012-04-19 17:01:53