2016-05-31 86 views
1

我有一個功能的性能問題,我想明白爲什麼。我不習慣閱讀EXPLAIN結果,所以我要求您提供建議。與功能性能問題與PostgreSQL的SELECT查詢

我已經創建了一個基於搜索標準返回ID的功能:

scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}') 

它返回一個數組,並採取小於60ms的運行:

'{120,122,125}' 

當我執行此查詢時,答案確實很慢:

WITH 
D AS (
SELECT 
    DISTINCT ON (ChannelId) 
    ChannelId 
    ,TimeValue 
    ,FloatValue 
FROM 
    datastore.inline 
WHERE 
    -- ANY('{120,122,125}') 
    channelId = ANY(scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}')) 
    AND QualityCodeId = 6 
    AND GranulityId = 1 
    AND TimeValue >= date_trunc('day', now()::TIMESTAMP) AND TimeValue < date_trunc('day', now()::TIMESTAMP) + '1 day'::INTERVAL 
ORDER BY 
    ChannelId 
    ,TimeValue DESC 
) 

SELECT 
    D.* 
    ,CM.ChannelUnitsId 
    ,CM.MeasureLabel 
    ,CM.SiteId 
    ,CM.SiteLabel 
FROM 
    D JOIN scientific.channelMetadata AS CM ON (D.ChannelId = CM.ChannelId) 

查詢刨花返回:

"Hash Join (cost=12320.23..12331.54 rows=3 width=332) (actual time=9150.405..9150.463 rows=3 loops=1)" 
" Hash Cond: (s.channelid = d.channelid)" 
" CTE d" 
" -> Unique (cost=12191.27..12192.55 rows=1 width=20) (actual time=9146.634..9146.647 rows=3 loops=1)" 
"   -> Sort (cost=12191.27..12191.91 rows=256 width=20) (actual time=9146.633..9146.644 rows=108 loops=1)" 
"    Sort Key: inline.channelid, inline.timevalue DESC" 
"    Sort Method: quicksort Memory: 33kB" 
"    -> Bitmap Heap Scan on inline (cost=197.79..12181.03 rows=256 width=20) (actual time=6729.253..9146.572 rows=108 loops=1)" 
"      Recheck Cond: ((timevalue >= date_trunc('day'::text, (now())::timestamp without time zone)) AND (timevalue < (date_trunc('day'::text, (now())::timestamp without time zone) + '1 day'::interval)))" 
"      Filter: ((qualitycodeid = 6) AND (granulityid = 1) AND (channelid = ANY (scientific.selectchannels('{41R001}'::text[], '{}'::text[], '{O3,SO2,NO2}'::text[], NULL::text[], '{RTU}'::text[], NULL::text[], NULL::text[], false))))" 
"      Rows Removed by Filter: 6132" 
"      Heap Blocks: exact=104" 
"      -> Bitmap Index Scan on idx_inline_timevalue (cost=0.00..197.73 rows=5729 width=0) (actual time=1.016..1.016 rows=6240 loops=1)" 
"       Index Cond: ((timevalue >= date_trunc('day'::text, (now())::timestamp without time zone)) AND (timevalue < (date_trunc('day'::text, (now())::timestamp without time zone) + '1 day'::interval)))" 
" -> Sort (cost=127.65..129.39 rows=694 width=450) (actual time=3.714..3.740 rows=694 loops=1)" 
"  Sort Key: s.siteid, s.networkid, s.measureid, s.jokerid" 
"  Sort Method: quicksort Memory: 118kB" 
"  CTE s" 
"   -> Hash Join (cost=14.44..81.02 rows=694 width=182) (actual time=0.117..1.219 rows=694 loops=1)" 
"    Hash Cond: ((c.unitsid)::text = (u.id)::text)" 
"    -> Hash Join (cost=12.79..66.35 rows=694 width=150) (actual time=0.098..0.866 rows=694 loops=1)" 
"      Hash Cond: ((c.networkid)::text = (n.id)::text)" 
"      -> Hash Join (cost=11.43..55.45 rows=694 width=137) (actual time=0.088..0.644 rows=694 loops=1)" 
"       Hash Cond: ((c.siteid)::text = (s_1.id)::text)" 
"       -> Hash Join (cost=5.15..39.63 rows=694 width=122) (actual time=0.052..0.406 rows=694 loops=1)" 
"         Hash Cond: ((c.measureid)::text = (m.id)::text)" 
"         -> Seq Scan on channel c (cost=0.00..24.94 rows=694 width=97) (actual time=0.001..0.164 rows=694 loops=1)" 
"         -> Hash (cost=3.40..3.40 rows=140 width=31) (actual time=0.047..0.047 rows=140 loops=1)" 
"          Buckets: 1024 Batches: 1 Memory Usage: 16kB" 
"          -> Seq Scan on measure m (cost=0.00..3.40 rows=140 width=31) (actual time=0.002..0.023 rows=140 loops=1)" 
"       -> Hash (cost=3.90..3.90 rows=190 width=22) (actual time=0.032..0.032 rows=101 loops=1)" 
"         Buckets: 1024 Batches: 1 Memory Usage: 14kB" 
"         -> Seq Scan on site s_1 (cost=0.00..3.90 rows=190 width=22) (actual time=0.002..0.014 rows=101 loops=1)" 
"      -> Hash (cost=1.16..1.16 rows=16 width=17) (actual time=0.007..0.007 rows=18 loops=1)" 
"       Buckets: 1024 Batches: 1 Memory Usage: 9kB" 
"       -> Seq Scan on network n (cost=0.00..1.16 rows=16 width=17) (actual time=0.001..0.002 rows=18 loops=1)" 
"    -> Hash (cost=1.29..1.29 rows=29 width=36) (actual time=0.013..0.013 rows=29 loops=1)" 
"      Buckets: 1024 Batches: 1 Memory Usage: 10kB" 
"      -> Seq Scan on units u (cost=0.00..1.29 rows=29 width=36) (actual time=0.003..0.009 rows=29 loops=1)" 
"  -> CTE Scan on s (cost=0.00..13.88 rows=694 width=450) (actual time=0.119..1.839 rows=694 loops=1)" 
" -> Hash (cost=0.02..0.02 rows=1 width=20) (actual time=9146.650..9146.650 rows=3 loops=1)" 
"  Buckets: 1024 Batches: 1 Memory Usage: 9kB" 
"  -> CTE Scan on d (cost=0.00..0.02 rows=1 width=20) (actual time=9146.636..9146.649 rows=3 loops=1)" 
"Planning time: 1.931 ms" 
"Execution time: 9150.594 ms" 

大約需要10s取3行。 如果我用硬編碼的結果'{120,122,125}'來替換函數調用,它會少於100ms。查詢刨然後返回:

"Hash Join (cost=376.05..387.36 rows=3 width=332) (actual time=7.483..7.595 rows=3 loops=1)" 
" Hash Cond: (s.channelid = d.channelid)" 
" CTE d" 
" -> Unique (cost=247.94..248.37 rows=1 width=20) (actual time=0.172..0.194 rows=3 loops=1)" 
"   -> Sort (cost=247.94..248.15 rows=87 width=20) (actual time=0.172..0.182 rows=108 loops=1)" 
"    Sort Key: inline.channelid, inline.timevalue DESC" 
"    Sort Method: quicksort Memory: 33kB" 
"    -> Index Scan using uq_inline on inline (cost=0.44..245.13 rows=87 width=20) (actual time=0.036..0.109 rows=108 loops=1)" 
"      Index Cond: ((qualitycodeid = 6) AND (channelid = ANY ('{120,122,125}'::integer[])) AND (granulityid = 1) AND (timevalue >= date_trunc('day'::text, (now())::timestamp without time zone)) AND (timevalue < (date_trunc('day'::text, (now((...)" 
" -> Sort (cost=127.65..129.39 rows=694 width=450) (actual time=7.215..7.269 rows=694 loops=1)" 
"  Sort Key: s.siteid, s.networkid, s.measureid, s.jokerid" 
"  Sort Method: quicksort Memory: 118kB" 
"  CTE s" 
"   -> Hash Join (cost=14.44..81.02 rows=694 width=182) (actual time=0.225..2.351 rows=694 loops=1)" 
"    Hash Cond: ((c.unitsid)::text = (u.id)::text)" 
"    -> Hash Join (cost=12.79..66.35 rows=694 width=150) (actual time=0.190..1.661 rows=694 loops=1)" 
"      Hash Cond: ((c.networkid)::text = (n.id)::text)" 
"      -> Hash Join (cost=11.43..55.45 rows=694 width=137) (actual time=0.173..1.279 rows=694 loops=1)" 
"       Hash Cond: ((c.siteid)::text = (s_1.id)::text)" 
"       -> Hash Join (cost=5.15..39.63 rows=694 width=122) (actual time=0.106..0.817 rows=694 loops=1)" 
"         Hash Cond: ((c.measureid)::text = (m.id)::text)" 
"         -> Seq Scan on channel c (cost=0.00..24.94 rows=694 width=97) (actual time=0.002..0.306 rows=694 loops=1)" 
"         -> Hash (cost=3.40..3.40 rows=140 width=31) (actual time=0.096..0.096 rows=140 loops=1)" 
"          Buckets: 1024 Batches: 1 Memory Usage: 16kB" 
"          -> Seq Scan on measure m (cost=0.00..3.40 rows=140 width=31) (actual time=0.005..0.040 rows=140 loops=1)" 
"       -> Hash (cost=3.90..3.90 rows=190 width=22) (actual time=0.063..0.063 rows=101 loops=1)" 
"         Buckets: 1024 Batches: 1 Memory Usage: 14kB" 
"         -> Seq Scan on site s_1 (cost=0.00..3.90 rows=190 width=22) (actual time=0.005..0.037 rows=101 loops=1)" 
"      -> Hash (cost=1.16..1.16 rows=16 width=17) (actual time=0.011..0.011 rows=18 loops=1)" 
"       Buckets: 1024 Batches: 1 Memory Usage: 9kB" 
"       -> Seq Scan on network n (cost=0.00..1.16 rows=16 width=17) (actual time=0.002..0.003 rows=18 loops=1)" 
"    -> Hash (cost=1.29..1.29 rows=29 width=36) (actual time=0.027..0.027 rows=29 loops=1)" 
"      Buckets: 1024 Batches: 1 Memory Usage: 10kB" 
"      -> Seq Scan on units u (cost=0.00..1.29 rows=29 width=36) (actual time=0.007..0.016 rows=29 loops=1)" 
"  -> CTE Scan on s (cost=0.00..13.88 rows=694 width=450) (actual time=0.228..3.590 rows=694 loops=1)" 
" -> Hash (cost=0.02..0.02 rows=1 width=20) (actual time=0.202..0.202 rows=3 loops=1)" 
"  Buckets: 1024 Batches: 1 Memory Usage: 9kB" 
"  -> CTE Scan on d (cost=0.00..0.02 rows=1 width=20) (actual time=0.175..0.199 rows=3 loops=1)" 
"Planning time: 1.676 ms" 
"Execution time: 7.734 ms" 

有人可以解釋爲什麼我的函數掛在這個過程? 功能定義如下:

CREATE OR REPLACE FUNCTION scientific.selectChannels(
    _siteid TEXT[] = '{%}', 
    _measureid TEXT[] = '{}', 
    _measurelabel TEXT[] = '{%}', 
    _jokerid TEXT[] = NULL, 
    _networkid TEXT[] = NULL, 
    _sitetype TEXT[] = NULL, 
    _tags TEXT[] = NULL, 
    _inactive BOOLEAN = FALSE 
) 
RETURNS INTEGER[] AS 
$selectChannels$ 

SELECT 
array(
SELECT 
    CM.ChannelId 
FROM 
    scientific.ChannelMetadata AS CM 
WHERE 
    (CM.SiteId ILIKE ANY(_siteid)) 
    AND ((CM.MeasureId ILIKE ANY(_measureid)) OR (CM.MeasureLabel LIKE ANY(_measurelabel))) 
    AND ((CM.JokerId ILIKE ANY(_jokerid)) OR (_jokerid IS NULL)) 
    AND (CM.NetworkId ILIKE ANY(_networkid) OR (_networkid IS NULL)) 
    AND ((CM.SiteType ILIKE ANY(_sitetype)) OR (_sitetype IS NULL) OR (CM.SiteType IS NULL)) 
    AND ((array_lowercase(_tags) && array_lowercase(CM.Tags)) OR (_tags IS NULL) OR (CM.Tags IS NULL)) 
    AND (_inactive OR CM.ActiveFlag) 
ORDER BY 
    -- Do not alter this ORDER BY clause (Python API relies on this invariant: channels are ordered by id): 
    CM.ChannelId 
); 

$selectChannels$ 
LANGUAGE SQL; 

channelMetadaVIEW合併5個表。

我可以看到,在第一次運行rows removed(我不知道如何解釋此消息),但它可能是答案的一部分。 我認爲唯一的事情是,不是一次計算,而是多次調用我的函數,併成爲我查詢的瓶頸。 我不知道如何解決這個問題。

的問題是:

  • 正所謂不止一次我的功能?
  • 我必須更改我的功能簽名嗎?
  • 我在做什麼這個查詢錯誤?

更新

如果我提取陣列和查詢重建,表演從很遠越好。 爲什麼我的功能簽名會減慢過程?

[...] 
channelId = ANY(array(SELECT scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}'))) 
[...] 

回答

4

正所謂我的功能不止一次?

是的。

它被CTE返回的108行中的每一行調用。實際上,幸運的是,TimeValue列中的條件可以通過索引來解決,否則您的函數將被調用一次,用於表inline中的6240行中的每一行,而不僅僅是由其他部分選擇的108行where條件。

當你的功能不依賴於任何文件從該表中檢索的值,你也許可以通過把結果放入另一個CTE改善這一點:

WITH func_result (channels) AS (
    select scientific.selectChannels(_measurelabel := '{O3,SO2,NO2}', _siteid:='{41R001}', _networkid := '{RTU}') 
), D AS (
    SELECT DISTINCT ON (ChannelId) ChannelId ,TimeValue, FloatValue 
    FROM datastore.inline 
    WHERE 
    channelId = ANY((select channels from func_result)::int[]) 
    AND QualityCodeId = 6 
    AND GranulityId = 1 
    AND TimeValue >= date_trunc('day', now()::TIMESTAMP) AND TimeValue < date_trunc('day', now()::TIMESTAMP) + '1 day'::INTERVAL 
    ORDER BY ChannelId, TimeValue DESC 
) 
SELECT 
    D.* 
    ,CM.ChannelUnitsId 
    ,CM.MeasureLabel 
    ,CM.SiteId 
    ,CM.SiteLabel 
FROM 
    D JOIN scientific.channelMetadata AS CM ON (D.ChannelId = CM.ChannelId) 

而不是

FROM datastore.inline 
WHERE channelId = ANY((select channels from func_result)::int[]) 

你也可以使用:

FROM datastore.inline 
    JOIN func_result f on channelid = any(f.channels) 
WHERE QualityCodeId = 6 
    AND GranulityId = 1 
    AND TimeValue ... 

這將導致在功能作c只有一次。


理論上你可以通過它定義爲immutable調用該函數只有一次欺騙的Postgres。但是,這將是一個公然的謊言(和一個非常骯髒的伎倆),因爲該功能確實做了數據庫查找和功能,那永遠不應該被定義爲immutable