2014-01-13 77 views
8

在PostgreSQL 9.3中,我存儲了一些嵌套在數組中的相當複雜的JSON對象。這片段是不是真實的數據,但說明了同一個概念:如何查詢和索引PostgreSQL 9.3+中深層嵌套的多層次的JSON數據?

{ 
    "customerId" : "12345", 
    "orders" : [{ 
     "orderId" : "54321", 
     "lineItems" : [{ 
     "productId" : "abc", 
     "qty" : 3 
     }, { 
     "productId" : "def", 
     "qty" : 1 
     }] 
    } 
} 

我要爲SQL查詢不只是在這個單一的JSON結構上lineItem對象進行操作......的能力,但在所有的JSON對象在該表格欄中。例如,返回所有不同productId的SQL查詢,以及它們的總銷售額qty總和。爲了防止這樣的查詢整天進行,我可能需要在lineItem或其子字段上的索引。

使用this StackOverflow question,我想通了如何編寫一個查詢,工作原理:

SELECT 
    line_item->>'productId' AS product_id, 
    SUM(CAST(line_item->>'qty' AS INTEGER)) AS qty_sold 
FROM 
    my_table, 
    json_array_elements(my_table.my_json_column->'orders') AS order, 
    json_array_elements(order->'lineItems') AS line_item 
GROUP BY product_id; 

但是,原來的StackOverflow問題處理的數據也僅僅嵌套深,而不是兩個一個級別。我擴展了相同的概念(即FROM條款中的「橫向連接」),方法是增加一個額外的橫向連接以深入一級。但是,我不確定這是否是最好的方法,因此我的問題的第一部分是:查詢JSON數據的最佳方法是什麼?JSON對象深處的任意數目的級別

對於第二部分,在這樣的嵌套數據上創建一個索引,this StackOverflow question再次處理只嵌套一層的數據。然而,我完全失去了知覺,我的腦海裏想着如何將它應用到更深層次。任何人都可以提供一個清晰的方法來索引至少兩個級別的數據,如上面的lineItems

回答

2

爲了應付無限遞歸問題,你需要使用一個recursive CTE每個錶行中在每個單獨的JSON元素進行操作:

WITH RECURSIVE 

raw_json as (

    SELECT 

    * 

    FROM 

    (VALUES 

    (1, 
    '{ 
    "customerId": "12345", 
    "orders": [ 
    { 
     "orderId": "54321", 
     "lineItems": [ 
     { 
      "productId": "abc", 
      "qty": 3 
     }, 
     { 
      "productId": "def", 
      "qty": 1 
     } 
     ] 
    } 
    ] 
}'::json), 

    (2, 
    '{ 
    "customerId": "678910", 
    "artibitraryLevel": { 
    "orders": [ 
     { 
     "orderId": "55345", 
     "lineItems": [ 
      { 
      "productId": "abc", 
      "qty": 3 
      }, 
      { 
      "productId": "ghi", 
      "qty": 10 
      } 
     ] 
     } 
    ] 
    } 
}'::json) 



) a(id,sample_json) 

), 


json_recursive as (

    SELECT 
    a.id, 
    b.k, 
    b.v, 
    b.json_type, 
    case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id, --track any arbitrary id when iterating through json graph 
    case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id, 
    case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id 

    FROM 

    (

     SELECT 

     id, 
     sample_json v, 
     case left(sample_json::text,1) 
      when '[' then 'array' 
      when '{' then 'object' 
      else 'scalar' 
     end json_type, --because choice of json accessor function depends on this, and for some reason postgres has no built in function to get this value 
     sample_json->>'customerId' customer_id, 
     sample_json->>'orderId' order_id, 
     sample_json->>'productId' product_id 

     FROM 

     raw_json 
    ) a 
    CROSS JOIN LATERAL (

     SELECT 

     b.k, 
     b.v, 
     case left(b.v::text,1) 
      when '[' then 'array' 
      when '{' then 'object' 
      else 'scalar' 
     end json_type 


     FROM 

     json_each(case json_type when 'object' then a.v else null end) b(k,v) --get key value pairs for individual elements if we are dealing with standard object 

    UNION ALL 


     SELECT 

     null::text k, 
     c.v, 
     case left(c.v::text,1) 
      when '[' then 'array' 
      when '{' then 'object' 
      else 'scalar' 
     end json_type 


     FROM 

     json_array_elements(case json_type when 'array' then a.v else null end) c(v) --if we have an array, just get the elements and use parent key 


    ) b 


UNION ALL --recursive term 

    SELECT 
    a.id, 
    b.k, 
    b.v, 
    b.json_type, 
    case when b.json_type = 'object' and not (b.v->>'customerId') is null then b.v->>'customerId' else a.customer_id end customer_id, 
    case when b.json_type = 'object' and not (b.v->>'orderId') is null then b.v->>'orderId' else a.order_id end order_id, 
    case when b.json_type = 'object' and not (b.v->>'productId') is null then b.v->>'productId' else a.product_id end product_id 




    FROM 

    json_recursive a 
    CROSS JOIN LATERAL (

     SELECT 

     b.k, 
     b.v, 
     case left(b.v::text,1) 
      when '[' then 'array' 
      when '{' then 'object' 
      else 'scalar' 
     end json_type 


     FROM 

     json_each(case json_type when 'object' then a.v else null end) b(k,v) 


    UNION ALL 


     SELECT 

     a.k, 
     c.v, 
     case left(c.v::text,1) 
      when '[' then 'array' 
      when '{' then 'object' 
      else 'scalar' 
     end json_type 


     FROM 

     json_array_elements(case json_type when 'array' then a.v else null end) c(v) 

    ) b 

) 

此後,您可以和「數量」的任意ID。 ..

SELECT 
    customer_id, 
    sum(v::text::integer) 

FROM 

    json_recursive 

WHERE 

    k = 'qty' 

GROUP BY 

    customer_id 

或者你可以得到 「LINEITEM」 對象和操縱它們,你的願望:

SELECT 

    * 

FROM 

    json_recursive 

WHERE 

    k = 'lineItems' and json_type = 'object' 

至於索引,你可以在遞歸查詢改編成返回唯一的密鑰爲原始表的每一行中的每個JSON對象的功能,然後在你的JSON列創建函數索引:

SELECT 

    array_agg(DISTINCT k) 

FROM 

    json_recursive 

WHERE 

    not k is null