從大jsonb場檢索多個值更快（PostgreSQL的9.4）

TL;博士從大jsonb場檢索多個值更快（PostgreSQL的9.4）

使用PSQL 9.4，有沒有辦法從jsonb場檢索多個值，如你會與虛函數：

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

隨着加速到選擇多個值（1個值= 300毫秒，2值= 450ms，3個值= 600毫秒）

所需的幾乎否則線性時的希望背景

我有以下jsonb表：

CREATE TABLE "public"."analysis" (
    "date" date NOT NULL, 
    "name" character varying (10) NOT NULL, 
    "country" character (3) NOT NULL, 
    "x" jsonb, 
    PRIMARY KEY(date,name) 
);

有大約100 000行，其中每行具有90+鍵和相應值的jsonb字典。我試圖寫一個SQL查詢來選擇幾個（< 10）鍵+在一個相當快速路值（< 500毫秒）

指數和查詢：190ms

我開始通過添加指數：

CREATE INDEX ON analysis USING GIN (x);

這使得基於值在「x」字典快速，如這個查詢：

SELECT date, name, country FROM analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

這需要〜190毫秒（可接受我們）

檢索詞典值

但是，一旦我開始添加鍵在SELECT部分返回，執行時間幾乎上升線性：

1值：300毫秒

select jsonb_extract_path(x, 'a_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

注意到366ms（+ 175ms之間）

select x#>'{a_dictionary_key}' as gear_down_altitude from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

注意到300毫秒（+ 110毫秒）

3個值：600毫秒

select jsonb_extract_path(x, 'a_dictionary_key'), jsonb_extract_path(x, 'a_second_dictionary_key'), jsonb_extract_path(x, 'a_third_dictionary_key') from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100;

注意到600毫秒（410，或100爲每個選定的值）

select x#>'{a_dictionary_key}' as a_dictionary_key, x#>'{a_second_dictionary_key}' as a_second_dictionary_key, x#>'{a_third_dictionary_key}' as a_third_dictionary_key from analysis where date > '2014-01-01' and date < '2014-05-01' and cast(x#>> '{a_dictionary_key}' as float) > 100 ;

注意到600ms（每個值選+410或+100）

個檢索多個值更快

有沒有辦法從jsonb場檢索多個值，如你會與虛函數：

jsonb_extract_path(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key'])

這有可能加快這些查找。它可以將它們以列或列表/數組或甚至json對象的形式返回。

檢索使用PL/Python的

只爲它赫克我提出使用PL/Python的一個自定義函數數組，但這是慢得多（5S +），可能是由於json.loads：

CREATE OR REPLACE FUNCTION retrieve_objects(data jsonb, k VARCHAR[]) 
RETURNS TEXT[] AS $$ 
    if not data: 
    return [] 

    import simplejson as json 
    j = json.loads(data) 

    l = [] 
    for i in k: 
    l.append(j[i]) 

    return l 

$$ LANGUAGE plpython2u; 

# Usage: 
# select retrieve_objects(x, ARRAY['a_dictionary_key', 'a_second_dictionary_key', 'a_third_dictionary_key']) from analysis where date > '2014-01-01' and date < '2014-05-01'

更新2015年5月21日

我重新實現使用具有GIN索引和性能hstore表幾乎等同於使用jsonb，即不對我而言有幫助。

來源

2015-05-19 Niklas B

您正在使用#> operator，它看起來像是執行路徑搜索。你有沒有試過正常的->查找？像：

select json_column->'json_field1' 
,  json_column->'json_field2'

如果您使用臨時表，看看會發生什麼會很有趣。例如：

create temporary table tmp_doclist (doc jsonb) 
; 
insert tmp_doclist 
     (doc) 
select x 
from analysis 
where ... your conditions here ... 
; 
select doc->'col1' 
,  doc->'col2' 
,  doc->'col3' 
from tmp_doclist 
;

來源

2015-05-19 13:41:24 Andomar

對於3列，使用' - >'需要560ms，使用'＃>'則需要580ms。嘗試使用臨時表，使用'select x into ....'，然後從需要650ms的文檔列表中進行選擇 –

添加/刪除列時臨時表的計時是否發生變化？對於字典查找來說，100ms似乎也顯得有些古怪，你會期望在納秒內完成這項工作。 – Andomar

創建臨時表需要200ms，1值爲150ms，2值爲285ms，3值爲460ms。我想知道postgresql是否爲每個查詢執行json解碼，但不應該是這種情況（特別是不適用於jsonb） –

從大jsonb場檢索多個值更快（PostgreSQL的9.4）

回答

相關問題