2013-10-11 46 views
1

我試圖讓這個查詢運行得更快。看起來按質量字段排序是真正減慢它的速度(表格大約有500萬行) - 也許有一個我可以使用的索引?用緩慢的ORDER優化PostgreSQL查詢BY

查詢:

SELECT "connectr_twitterpassage"."id", "connectr_twitterpassage"."third_party_id", "connectr_twitterpassage"."third_party_created", "connectr_twitterpassage"."source", "connectr_twitterpassage"."text", "connectr_twitterpassage"."author", "connectr_twitterpassage"."raw_data", "connectr_twitterpassage"."retweet_count", "connectr_twitterpassage"."favorited_count", "connectr_twitterpassage"."lang", "connectr_twitterpassage"."location", "connectr_twitterpassage"."author_followers_count", "connectr_twitterpassage"."is_retweet", "connectr_twitterpassage"."url", "connectr_twitterpassage"."author_fk_id", "connectr_twitterpassage"."quality", "connectr_twitterpassage"."is_top_tweet", "connectr_twitterpassage"."created", "connectr_twitterpassage"."modified" 
FROM "connectr_twitterpassage" 
INNER JOIN "connectr_twitterpassage_words" ON ("connectr_twitterpassage"."id" = "connectr_twitterpassage_words"."twitterpassage_id") 
WHERE "connectr_twitterpassage_words"."word_id" = 18974807 
ORDER BY "connectr_twitterpassage"."quality" 
DESC LIMIT 100 

這裏是EXPLAIN分析一下:http://explain.depesz.com/s/7zb

而表定義:

\d connectr_twitterpassage 

      Column   |   Type   |        Modifiers        
    ------------------------+--------------------------+---------------------------------------------------------------------- 
    id   

     | integer     | not null default nextval('connectr_twitterpassage_id_seq'::regclass) 
third_party_id   | character varying(10000) | not null 
source     | character varying(10000) | not null 
text     | character varying(10000) | not null 
author     | character varying(10000) | not null 
raw_data    | character varying(10000) | not null 
created    | timestamp with time zone | not null 
modified    | timestamp with time zone | not null 
third_party_created | timestamp with time zone | 
retweet_count   | integer     | not null 
favorited_count  | integer     | not null 
lang     | character varying(10000) | not null 
location    | character varying(10000) | not null 
author_followers_count | integer     | not null 
is_retweet    | boolean     | not null 
url     | character varying(10000) | not null 
author_fk_id   | integer     | 
quality    | bigint     | 
is_top_tweet   | boolean     | not null 
Indexes: 
    "connectr_passage_pkey" PRIMARY KEY, btree (id) 
    "connectr_twitterpassage_third_party_id_uniq" UNIQUE CONSTRAINT, btree (third_party_id) 
    "connectr_passage_author_followers_count" btree (author_followers_count) 
    "connectr_passage_favorited_count" btree (favorited_count) 
    "connectr_passage_retweet_count" btree (retweet_count) 
    "connectr_passage_source" btree (source) 
    "connectr_passage_source_like" btree (source varchar_pattern_ops) 
    "connectr_passage_third_party_id" btree (third_party_id) 
    "connectr_passage_third_party_id_like" btree (third_party_id varchar_pattern_ops) 
    "connectr_twitterpassage_author_fk_id" btree (author_fk_id) 
    "connectr_twitterpassage_created" btree (created) 
    "connectr_twitterpassage_is_top_tweet" btree (is_top_tweet) 
    "connectr_twitterpassage_quality" btree (quality) 
    "connectr_twitterpassage_third_party_created" btree (third_party_created) 
    "id_to_quality_sorted" btree (id, quality DESC NULLS LAST) 
Foreign-key constraints: 
    "author_fk_id_refs_id_074720a5" FOREIGN KEY (author_fk_id) REFERENCES connectr_twitteruser(id) DEFERRABLE INITIALLY DEFERRED 
Referenced by: 
    TABLE "connectr_passageviewevent" CONSTRAINT "passage_id_refs_id_892b36a6" FOREIGN KEY (passage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "connectr_connection" CONSTRAINT "twitter_from_id_refs_id_8adbab24" FOREIGN KEY (twitter_from_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "connectr_connection" CONSTRAINT "twitter_to_id_refs_id_8adbab24" FOREIGN KEY (twitter_to_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "connectr_twitterpassage_words" CONSTRAINT "twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED 


connectr=# \d connectr_twitterpassage_words 
           Table "public.connectr_twitterpassage_words" 
     Column  | Type |         Modifiers         
-------------------+---------+---------------------------------------------------------------------------- 
id    | integer | not null default nextval('connectr_twitterpassage_words_id_seq'::regclass) 
twitterpassage_id | integer | not null 
word_id   | integer | not null 
Indexes: 
    "connectr_twitterpassage_words_pkey" PRIMARY KEY, btree (id) 
    "connectr_twitterpassage_twitterpassage_id_613c80271f09fba8_uniq" UNIQUE CONSTRAINT, btree (twitterpassage_id, word_id) 
    "connectr_twitterpassage_words_twitterpassage_id" btree (twitterpassage_id) 
    "connectr_twitterpassage_words_word_id" btree (word_id) 
    "word_to_twitterpassage_id" btree (word_id, twitterpassage_id) 
Foreign-key constraints: 
    "twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED 
    "word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED 

connectr=# \d connectr_word 
             Table "public.connectr_word" 
     Column  |   Type   |       Modifiers       
---------------------+--------------------------+------------------------------------------------------------ 
id     | integer     | not null default nextval('connectr_word_id_seq'::regclass) 
word    | character varying(10000) | not null 
created    | timestamp with time zone | not null 
modified   | timestamp with time zone | not null 
frequency   | double precision   | 
is_username   | boolean     | not null 
is_hashtag   | boolean     | not null 
cloud_eligible  | boolean     | not null 
passage_count  | integer     | 
avg_quality   | double precision   | 
last_twitter_search | timestamp with time zone | 
cloud_approved  | boolean     | not null 
display_word  | character varying(10000) | not null 
is_trend   | boolean     | not null 
Indexes: 
    "connectr_word_pkey" PRIMARY KEY, btree (id) 
    "connectr_word_word_uniq" UNIQUE CONSTRAINT, btree (word) 
    "connectr_word_avg_quality" btree (avg_quality) 
    "connectr_word_cloud_eligible" btree (cloud_eligible) 
    "connectr_word_last_twitter_search" btree (last_twitter_search) 
    "connectr_word_passage_count" btree (passage_count) 
    "connectr_word_word" btree (word) 
Referenced by: 
    TABLE "connectr_passageviewevent" CONSTRAINT "source_word_id_refs_id_178d46eb" FOREIGN KEY (source_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "connectr_wordmatchrewardevent" CONSTRAINT "tapped_word_id_refs_id_c2ffb369" FOREIGN KEY (tapped_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "connectr_connection" CONSTRAINT "word_id_refs_id_00cccde2" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED 
    TABLE "connectr_twitterpassage_words" CONSTRAINT "word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED 
+0

這看起來有點像全文搜索。你偶然試圖查詢提及特定「單詞」的'connectr_twitterpassage'行嗎?如果是這樣,你不應該推出你自己的實現; postgres對內置的支持非常好。 – SingleNegationElimination

+0

嗯,我的想法是,根據精確匹配從表格中查找單詞,然後查找該單詞的匹配twitterpassages應該更快。我以前嘗試過使用pg_trigram,但速度並不快。 –

+0

如果您能承受停機時間,那麼在word_id列的單詞表上運行集羣可能會非常有益。我會從單詞表中取出word_id和通行ID單列索引。 –

回答

1

在解釋輸出來看,那種走很少的時間。它正在收集需要排序的數據,這需要花費時間。

您必須花費一點時間點擊磁盤。如果您可以更好地緩存數據,那麼使用相同的查詢應該會加速很多。

否則,你最好的選擇可能是通過增加質量領域的connectr_twitterpassage_words表,並有一個索引非規範化的數據(word_id,質量,...)