2012-08-13 18 views
3

嗨親愛的Postgres的用戶,如何我可以索引這個查詢免費文本搜索

短的故事,我有相關的國家(每個地址都有一個狀態)地址表。我將運行以下查詢進行搜索:

SELECT "addresses".*, (ts_rank((to_tsvector('simple', coalesce("addresses"."name"::text, '')) || to_tsvector('simple', coalesce(pg_search_85240c410826a2b0e0f0e5.pg_search_442c4ad3183a256248ef8d::text, ''))), (to_tsquery('simple', ''' ' || 'test' || ' ''' || ':*')), 0)) AS pg_search_rank 
FROM "addresses" 
LEFT OUTER JOIN (SELECT "addresses"."id" AS id, string_agg("states"."name"::text, ' ') AS pg_search_442c4ad3183a256248ef8d 
      FROM "addresses" INNER JOIN "states" ON "states"."id" = "addresses"."state_id" GROUP BY "addresses"."id") pg_search_85240c410826a2b0e0f0e5 ON pg_search_85240c410826a2b0e0f0e5.id = "addresses"."id" 
      WHERE (((to_tsvector('simple', coalesce("addresses"."name"::text, '')) || to_tsvector('simple', coalesce(pg_search_85240c410826a2b0e0f0e5.pg_search_442c4ad3183a256248ef8d::text, ''))) @@ (to_tsquery('simple', ''' ' || 'test' || ' ''' || ':*')))) AND (company_id = 2142) 
      ORDER BY pg_search_rank DESC, "addresses"."id" ASC 

直向前,對嗎?這在一臺好的機器和8581個地址上需要大約1.226ms。我需要改善這一點,所以我創建這個索引2

CREATE INDEX index_addresses_search_by_name ON addresses USING gin(to_tsvector('simple', COALESCE((public.addresses.name)::text, ''))) 
CREATE INDEX index_state_search_by_name ON states USING gin(to_tsvector('simple', COALESCE((public.states.name)::text, ''))) 

這將有助於創建上解決了指數,並在美國一個指標,但它並不:(查詢速度慢和以前一樣解釋顯示未使用的指標

請提供一個建議,

編輯:

這裏是解釋分析會顯示:

"Sort (cost=11.39..11.40 rows=1 width=4824) (actual time=0.947..0.947 rows=0 loops=1)" 
" Sort Key: (ts_rank((to_tsvector('simple'::regconfig, COALESCE((public.addresses.name)::text, ''::text)) || to_tsvector('simple'::regconfig, COALESCE((string_agg((states.name)::text, ' '::text)), ''::text))), '''test'':*'::tsquery, 0)), public.addresses.id" 
" Sort Method: quicksort Memory: 17kB" 
" -> Nested Loop Left Join (cost=6.19..11.38 rows=1 width=4824) (actual time=0.905..0.905 rows=0 loops=1)" 
"  Join Filter: (public.addresses.id = public.addresses.id)" 
"  Filter: ((to_tsvector('simple'::regconfig, COALESCE((public.addresses.name)::text, ''::text)) || to_tsvector('simple'::regconfig, COALESCE((string_agg((states.name)::text, ' '::text)), ''::text))) @@ '''test'':*'::tsquery)" 
"  -> Seq Scan on addresses (cost=0.00..5.14 rows=1 width=4792) (actual time=0.904..0.904 rows=0 loops=1)" 
"    Filter: (company_id = 2142)" 
"  -> HashAggregate (cost=6.19..6.20 rows=1 width=520) (never executed)" 
"    -> Hash Join (cost=1.02..6.18 rows=1 width=520) (never executed)" 
"     Hash Cond: (public.addresses.state_id = states.id)" 
"     -> Seq Scan on addresses (cost=0.00..5.11 rows=11 width=8) (never executed)" 
"     -> Hash (cost=1.01..1.01 rows=1 width=520) (never executed)" 
"       -> Seq Scan on states (cost=0.00..1.01 rows=1 width=520) (never executed)" 
"Total runtime: 1.226 ms" 

EDIT2

地址DDL

-- 
-- PostgreSQL database dump 
-- 

SET statement_timeout = 0; 
SET client_encoding = 'UTF8'; 
SET standard_conforming_strings = on; 
SET check_function_bodies = false; 
SET client_min_messages = warning; 

SET search_path = public, pg_catalog; 

SET default_tablespace = ''; 

SET default_with_oids = false; 

-- 
-- Name: addresses; Type: TABLE; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE TABLE addresses (
    id integer NOT NULL, 
    name character varying(255), 
    street character varying(255), 
    city character varying(255), 
    zip_code character varying(255), 
    primary_phone character varying(255), 
    alternate_phone character varying(255), 
    fax character varying(255), 
    email character varying(255), 
    contact character varying(255), 
    company_id integer DEFAULT 0 NOT NULL, 
    motor_carrier_number character varying(12), 
    state_id integer, 
    created_at timestamp without time zone, 
    updated_at timestamp without time zone, 
    alternate_phone2 character varying(12), 
    insurance_expires_on date, 
    notes text 
); 


ALTER TABLE public.addresses OWNER TO trucking; 

-- 
-- Name: addresses_id_seq; Type: SEQUENCE; Schema: public; Owner: trucking 
-- 

CREATE SEQUENCE addresses_id_seq 
    START WITH 1 
    INCREMENT BY 1 
    NO MINVALUE 
    NO MAXVALUE 
    CACHE 1; 


ALTER TABLE public.addresses_id_seq OWNER TO trucking; 

-- 
-- Name: addresses_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: trucking 
-- 

ALTER SEQUENCE addresses_id_seq OWNED BY addresses.id; 


-- 
-- Name: id; Type: DEFAULT; Schema: public; Owner: trucking 
-- 

ALTER TABLE ONLY addresses ALTER COLUMN id SET DEFAULT nextval('addresses_id_seq'::regclass); 


-- 
-- Name: addresses_pkey; Type: CONSTRAINT; Schema: public; Owner: trucking; Tablespace: 
-- 

ALTER TABLE ONLY addresses 
    ADD CONSTRAINT addresses_pkey PRIMARY KEY (id); 


-- 
-- Name: index_addresses_on_company_id; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_addresses_on_company_id ON addresses USING btree (company_id); 


-- 
-- Name: index_addresses_on_state_id; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_addresses_on_state_id ON addresses USING btree (state_id); 


-- 
-- Name: index_addresses_search_by_city; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_addresses_search_by_city ON addresses USING gin (to_tsvector('simple'::regconfig, COALESCE((city)::text, ''::text))); 


-- 
-- Name: index_addresses_search_by_email; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_addresses_search_by_email ON addresses USING gin (to_tsvector('simple'::regconfig, COALESCE((email)::text, ''::text))); 


-- 
-- Name: index_addresses_search_by_name; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_addresses_search_by_name ON addresses USING gin (to_tsvector('simple'::regconfig, COALESCE((name)::text, ''::text))); 


-- 
-- Name: index_addresses_search_by_street; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_addresses_search_by_street ON addresses USING gin (to_tsvector('simple'::regconfig, COALESCE((street)::text, ''::text))); 


-- 
-- PostgreSQL database dump complete 
-- 

美國DDL

-- 
-- PostgreSQL database dump 
-- 

SET statement_timeout = 0; 
SET client_encoding = 'UTF8'; 
SET standard_conforming_strings = on; 
SET check_function_bodies = false; 
SET client_min_messages = warning; 

SET search_path = public, pg_catalog; 

SET default_tablespace = ''; 

SET default_with_oids = false; 

-- 
-- Name: states; Type: TABLE; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE TABLE states (
    id integer NOT NULL, 
    name character varying(255) NOT NULL, 
    abbrev character varying(255) NOT NULL, 
    country character varying(255) 
); 


ALTER TABLE public.states OWNER TO trucking; 

-- 
-- Name: states_id_seq; Type: SEQUENCE; Schema: public; Owner: trucking 
-- 

CREATE SEQUENCE states_id_seq 
    START WITH 1 
    INCREMENT BY 1 
    NO MINVALUE 
    NO MAXVALUE 
    CACHE 1; 


ALTER TABLE public.states_id_seq OWNER TO trucking; 

-- 
-- Name: states_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: trucking 
-- 

ALTER SEQUENCE states_id_seq OWNED BY states.id; 


-- 
-- Name: id; Type: DEFAULT; Schema: public; Owner: trucking 
-- 

ALTER TABLE ONLY states ALTER COLUMN id SET DEFAULT nextval('states_id_seq'::regclass); 


-- 
-- Name: states_pkey; Type: CONSTRAINT; Schema: public; Owner: trucking; Tablespace: 
-- 

ALTER TABLE ONLY states 
    ADD CONSTRAINT states_pkey PRIMARY KEY (id); 


-- 
-- Name: index_states_search_by_abbrev; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_states_search_by_abbrev ON states USING gin (to_tsvector('simple'::regconfig, COALESCE((abbrev)::text, ''::text))); 


-- 
-- Name: index_states_search_by_name; Type: INDEX; Schema: public; Owner: trucking; Tablespace: 
-- 

CREATE INDEX index_states_search_by_name ON states USING gin (to_tsvector('simple'::regconfig, COALESCE((name)::text, ''::text))); 


-- 
-- PostgreSQL database dump complete 
-- 

最佳改寫到目前爲止

SELECT consolidated_address.id, (ts_rank((to_tsvector('simple', coalesce(consolidated_address.name::text, '')) || to_tsvector('simple', coalesce(consolidated_address.state_name::text, ''))), (to_tsquery('simple', ''' ' || 'Gallaway' || ' ''' || ':*')), 0)) AS pg_search_rank 
FROM (
SELECT "addresses".id, 
    "addresses".name, 
    string_agg("states".name::text, ' ') as state_name 
    FROM addresses 
    LEFT OUTER JOIN "states" 
    ON "states".id = "addresses".state_id 
    GROUP BY "addresses".id) consolidated_address 
WHERE 
    (((to_tsvector('simple', coalesce(consolidated_address.name::text, '')) || to_tsvector('simple', coalesce(consolidated_address.state_name::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Gallaway' || ' ''' || ':*')))) 

這是一個快一點,但還是做ES沒有用任何指標

的謝謝,

+0

對於將來的問題,請確保顯示您的Pg版本,並且對於緩慢的查詢問題「EXPLAIN ANALYZE SELECT ....的輸出....您的查詢....」。查看https://wiki.postgresql.org/wiki/Slow_Query_Questions – 2012-08-14 00:17:41

+0

編輯了這個問題,謝謝@CraigRinger – Calin 2012-08-14 12:14:59

+0

你有索引創建後分析你的表嗎?我們可以看到涉及的表格和約束的DDL嗎? – vyegorov 2012-08-14 12:44:36

回答

0

這個查詢慢是因爲你是通過左外側連接到聯接的查詢使用聚合整個地址表然後通過COMPANY_ID限制。地址表是否有company_id?嘗試將where子句移到內部查詢中以限制返回的記錄數。

+0

感謝您的答案,但這並不能解釋爲什麼索引不使用 – Calin 2012-08-14 11:28:21

+0

順便說一句,我會+1這個答案,如果你能請你重寫與您的建議查詢。 – Calin 2012-08-14 11:28:49

1

回答交叉張貼在https://github.com/Casecommons/pg_search/issues/51

從GitHub我產生這個查詢pg_search紅寶石寶石的作者。


呀,可惜我不知道的一種方式來獲得:associated_against查詢對索引工作,至少事物的方式正在實現。

這是因爲:associated_against搜索所有關聯記錄的文本連接在一起,而不是在一個單獨的記錄上。例如,如果你加入了一個標籤表,並且有3條記錄(「foo」,「bar」和「baz」),那麼你會希望搜索「foo baz」來找到它。更容易索引的解決方案只適用於「foo」或「baz」查詢,但不適用於「foo baz」,因爲如果您知道我的意思,則單個標籤行中沒有任何一個符合這兩個條款。

不可能爲多個記錄建立索引(至少據我所知)。

也許我們可以在pg_search中選擇一個選項來執行逐條記錄搜索,它可以使用索引但不匹配記錄。

+0

我希望看到一個熟練的sql dude重寫這個查詢,如果有一種方法 – Calin 2012-08-14 19:09:34

+0

如果有辦法做到這一點,讓我知道,我會更新pg_search使用這種方式。 – nertzy 2012-09-09 22:40:04

1

我對你的解釋分析輸出的閱讀是你的表太小而無法使用索引。 PostgreSQL上的順序掃描相對便宜,因爲表格被加載爲物理文件,PostgreSQL可以利用操作系統預取等。

請記住,沒有計劃比從磁盤加載單個頁面並按順序掃描更快。使用索引只會增加開銷。在順序掃描上沒有任何重要的時間,所以不用擔心,直到你有真正的數據可以玩。

值得注意的是,在PostgreSQL上,在有數據之前優化查詢是比解決方案更多問題的祕訣。我真誠的建議是等待分析查詢,直到您有真實的數據返回。然後我們可以確定哪些索引會有幫助。

編輯:你說查詢很慢。但該計劃顯示它在1ms內執行。你預計會花多少時間?

相關問題