2017-07-10 80 views
0

我想要查詢存儲在我的HDFS下面的JSON示例文件如何查詢結構陣列配置單元(get_json_object)或JSON SERDE

{ 
    "tag1": "1.0", 
    "tag2": "blah", 
    "tag3": "blahblah", 
    "tag4": { 
     "tag4_1": [{ 
       "tag4_1_1": [{ 
         "tag4_1_1_1": { 
          "Addr": { 
           "Addr1": "blah", 
           "City": "City", 
           "StateProvCd": "NY", 
           "PostalCode": "99999" 
          } 
         } 
         "tag4_1_1_1": { 
          "Addr": { 
           "Addr1": "blah2", 
           "City": "City2", 
           "StateProvCd": "NY", 
           "PostalCode": "99999" 
          } 
         } 
        } 
       ] 
      } 
     ] 
    } 
} 

我用下面通過數據

創建外部表
CREATE EXTERNAL TABLE DB.hv_table 
(
    tag1 string 
, tag2 string 
, tag3 string 
, tag4 struct<tag4_1:ARRAY<struct<tag4_1_1:ARRAY<struct<tag4_1_1_1:struct<Addr 
       Addr1:string 
       , City:string 
       , StateProvCd:string 
       , PostalCode:string>>>>>> 
) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' 
LOCATION 'HDFS/location'; 

理想情況下,我要查詢的數據,它將返回給我這樣:

select tag1, tag2, tag3, tag4(all data) from DB.hv_table; 

有人可以提供我的我怎麼能查詢的例子,而不以下列方式寫它:

select tag1, tag2, tag3 
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.Addr1 as Addr1 
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.City as City 
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.StateProvCd as StateProvCd 
, tag4.tag4_1[0].tag4_1_1[0].tag4_1_1_1.Addr.PostalCode as PostalCode 
from DB.hv_table 

最重要的,我想不定義數組元素的項目數。在我的例子中,我只能定位數組的第一個元素(tag4_1_1_1)。如果可能的話,我會針對一切。

回答

0

找到一個很好的博客:ThornyDev

CREATE EXTERNAL TABLE IF NOT EXISTS DB.dummyTable (jsonBlob STRING) 
LOCATION 'pathOfYourFiles'; 

SELECT 
get_json_object(jsonBlob, '$.tag1') AS tag1 
,get_json_object(jsonBlob, '$.tag2') AS tag2 
,get_json_object(jsonBlob, '$.tag3') AS tag3 
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.Addr1') AS Addr1 
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.City') AS City 
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.StateProvCd') AS StateProvCd 
,get_json_object(jsonBlob, '$.tag4.tag4_1.tag4_1_1.tag4_1_1_1.Addr.PostalCode') AS PostalCode 
FROM DB.dummyTable 

我很滿意,但我想看看JSON的元組,看看它是如何執行對抗「get_json_object」類