在蜂巢

提取結構數組我在蜂房在蜂巢

CREATE EXTERNAL TABLE FOO ( 
    TS string, 
    customerId string, 
    products array< struct <productCategory:string, productId:string> > 
) 
PARTITIONED BY (ds string) 
ROW FORMAT SERDE 'some.serde' 
WITH SERDEPROPERTIES ('error.ignore'='true') 
LOCATION 'some_locations' 
;

表的記錄外部表可以容納的數據包括：

1340321132000, 'some_company', [{"productCategory":"footwear","productId":"nik3756"},{"productCategory":"eyewear","productId":"oak2449"}]

不要任何人知道，如果有一種方法簡單地從該記錄中提取所有productCategory，並將其作爲productCategories數組返回，而不使用爆炸。像下面這樣：

["footwear", "eyewear"]

或者我需要寫我自己GenericUDF，如果是這樣，我不知道太多的Java（Ruby的人），能有人給我一些提示？我從Apache Hive閱讀了關於UDF的一些說明。但是，我不知道哪個集合類型最適合處理數組，以及要處理結構的集合類型是什麼？

===

我有所寫一個GenericUDF回答了這個問題，但我遇到了其他2個問題。它是在這個SO Question

來源

2013-03-26 pchu

如果數組的大小是固定的（如2）。請嘗試：

products[0].productCategory,products[1].productCategory

但是，如果不是，UDF應該是正確的解決方案。我想你可以在JRuby中做到這一點。 GL！

來源

2013-03-26 07:20:21 www

謝謝，但數組的大小是不固定的。雖然使用JRuby的好主意，爲此，需要使用Java來編寫GenericUDF。更糟的是，在編寫GenericUDF時沒有太多參考。 – pchu 2013-03-26 12:23:14

一種方法是使用要麼inline或explode功能，像這樣：

SELECT 
    TS, 
    customerId, 
    pCat, 
    pId, 
FROM FOO 
LATERAL VIEW inline(products) p AS pCat, pId

否則，你可以寫UDF。請查看this post和this post。隨着以下資源：

來源

2016-02-29 01:39:24 chorbs

您可以使用JSON SERDE或內置的功能get_json_object，json_tuple。

隨着rcongiu's Hive-JSON SerDe的使用將是：

定義表：

CREATE TABLE complex_json (
DocId string, 
Orders array<struct<ItemId:int, OrderDate:string>>)

負載樣品JSON到它（這是重要的這個數據是一個襯裏）：

{"DocId":"ABC","Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}

然後提取訂單ID就像：

SELECT Orders.ItemId FROM complex_json LIMIT 100;

它將返回ID的列表供您：

爲itemid [1111,2222]

證明這對我的環境中返回正確的結果。全面上市：

add jar hdfs:///tmp/json-serde-1.3.6.jar; 

CREATE TABLE complex_json (
    DocId string, 
    Orders array<struct<ItemId:int, OrderDate:string>> 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; 

LOAD DATA INPATH '/tmp/test.json' OVERWRITE INTO TABLE complex_json; 

SELECT Orders.ItemId FROM complex_json LIMIT 100;

來源

2016-02-29 16:04:03 Viktor

回答

相關問題