提高查詢速度：簡單選擇與喜歡

我已經繼承了一個大的遺留代碼庫，它在django 1.5中運行，我目前的任務是加速一個網站部分，需要加載〜1min。提高查詢速度：簡單選擇與喜歡

我做了應用程序的輪廓，並得到這個：

特別罪魁禍首是以下查詢（剝離爲了簡潔）：

SELECT COUNT(*) FROM "entities_entity" WHERE (
    "entities_entity"."date_filed" <= '2016-01-21' AND (
    UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Atherton%') OR 
    UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR 
    -- 34 more of these 
    UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Atherton%') OR 
    UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR 
    -- 34 more of these 
) 
)

基本上由上大查詢兩個字段，entity_city_state_zip和agent_city_state_zip這是character varying(200) | not null字段。

查詢是執行兩次（！），同時每次18814.02ms，和一次更換COUNT的SELECT佔用額外20216.49（我要去緩存COUNT結果）

的這樣的解釋看起來：

Aggregate (cost=175867.33..175867.34 rows=1 width=0) (actual time=17841.502..17841.502 rows=1 loops=1) 
    -> Seq Scan on entities_entity (cost=0.00..175858.95 rows=3351 width=0) (actual time=0.849..17818.551 rows=145075 loops=1) 
     Filter: ((date_filed <= '2016-01-21'::date) AND ((upper((entity_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((entity_city_state_zip)::text) ~~ '%BERKELEY%'::text) (..skipped..) OR (upper((agent_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BERKELEY%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BURLINGAME%'::text))) 
     Rows Removed by Filter: 310249 
Planning time: 2.110 ms 
Execution time: 17841.944 ms

我對entity_city_state_zip和agent_city_state_zip使用各種字體，使用索引嘗試s組合如：

CREATE INDEX ON entities_entity (upper(entity_city_state_zip)); 
CREATE INDEX ON entities_entity (upper(agent_city_state_zip));

或使用varchar_pattern_ops，沒有運氣。

服務器使用這樣的事情：

qs = queryset.filter(Q(entity_city_state_zip__icontains = all_city_list) | 
        Q(agent_city_state_zip__icontains = all_city_list))

生成查詢。

我不知道還有什麼可以嘗試的，

謝謝！

來源

2016-01-21 NicoSantangelo

'LIKE'查詢，這與'開始 '％...''將不使用任何B樹索引（包括'xxx_pattern_ops'）。只有在模式匹配時纔會選擇這些索引。（f.ex.' col LIKE'XXX％''或'col〜'^ XXX''）。你可以試試['pg_trgm'模塊]（http://www.postgresql.org/docs/current/static/pgtrgm.html），[它爲你提供了一個合適的索引]（http：//dba.stackexchange。 COM /問題/ 10694 /模式匹配與樣類似到或正則表達式合的PostgreSQL/10696）。（你可以使用'ilike'來代替like'''lower（）'/'upper（）'調用）。 – pozs

@pozs我不知道！我會試一試 – NicoSantangelo

我至少想知道'Seq Scan'有什麼影響，以及索引掃描是否可以被替代。看看'set enable_seqscan = false'對計劃有什麼影響。數據庫是否從SSD運行？ –

我覺得問題在「multiple LIKE」和UPPER（「entities_entity ...

您可以使用：

WHERE entities_entity.entity_city_state_zip SIMILAR TO '%Atherton%|%Berkeley%'

或者是這樣的：

WHERE entities_entity.entity_city_state_zip LIKE ANY(ARRAY['%Atherton%', '%Berkeley%'])

編輯

關於在Django原始SQL查詢：

問候

來源

2016-01-21 15:41:29

我不知道'LIKE'支持'ANY'作爲一個數組作爲一個值。我的問題是使'django'創建該查詢我會谷歌了一下，看看我能找到 – NicoSantangelo

這是postgres的人）支持）那麼Django的...我認爲「原始查詢」這就是你想要的https： //docs.djangoproject.com/es/1.9/topics/db/sql/並閱讀此鏈接http://stackoverflow.com/questions/31698103/how-do-i-execute-raw-sql-in-a -django-migration –

我看着Pluralsight的課程是解決一個非常類似的問題。該課程是「Postgres for .NET開發人員」，這在「使用簡單SQL進行娛樂」，「全文搜索」部分。

要總結自己的解決方案，使用你的例子：

創建您的表中的新列將代表您entity_city_state_zip作爲的tsvector：

create table entities_entity (
    date_filed date, 
    entity_city_state_zip text, 
    csz_search tsvector not null -- add this column 
);

最初，你可能要讓它空的，然後填充數據並使其不可空。

update entities_entity 
set csz_search = to_tsvector (entity_city_state_zip);

接下來，創建一個觸發器，將導致新字段中填充添加一條記錄任何時間或修改：

create trigger entities_insert_update 
before insert or update on entities_entity 
for each row execute procedure 
tsvector_update_trigger(csz_search,'pg_catalog.english',entity_city_state_zip);

搜索查詢現在可以在的tsvector字段查詢，而不是城市/州/郵編領域：

select * from entities_entity 
where csz_search @@ to_tsquery('Atherton')

對這個感興趣的一些注意事項：

to_tsquery，如果你還沒有用過，比上面的例子更復雜。它允許和條件，部分匹配等
它也是區分大小寫的，所以沒有必要做你有upper功能在查詢

最後一步，把GIN指數在tsquery場：

create index entities_entity_ix1 on entities_entity 
using gin(csz_search);

如果我理解正確的路線，這應該讓你的查詢飛，它將克服B樹索引無力的問題上like '%查詢工作。

下面是這樣一個查詢說明計劃：

Bitmap Heap Scan on entities_entity (cost=56.16..1204.78 rows=505 width=81) 
    Recheck Cond: (csz_search @@ to_tsquery('Atherton'::text)) 
    -> Bitmap Index Scan on entities_entity_ix1 (cost=0.00..56.04 rows=505 width=0) 
     Index Cond: (csz_search @@ to_tsquery('Atherton'::text))

來源

2016-01-22 03:52:16 Hambone

這真的很酷，我會盡快嘗試 – NicoSantangelo

這真是太棒了。我對約2,000,000行數據做了一些快速測試，這種方法大約需要300毫秒，而對於傳統查詢則需要大約2.4秒。通過在較大數據集上嵌套「或」查詢，我敢打賭，這些差異會更加劇烈。 – Hambone

提高查詢速度：簡單選擇與喜歡

回答

相關問題