2017-07-21 73 views
0

我無法將嵌套的JSON數據加載到Hive表中。有人可以幫我嗎?下面是我曾嘗試:爲嵌套的JSON數據創建Hive表

樣品輸入:

{"DocId":"ABC","User1":{"Id":1234,"Username":"sam1234","Name":"Sam","ShippingAddress":{"Address1":"123 Main St.","Address2":null,"City":"Durham","State":"NC"},"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},{"ItemId":4352,"OrderDate":"12/12/2012"}]}} 

在蜂巢(CDH3):

ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar; 

CREATE TABLE json_tab(
    DocId string, 
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>> 
) 
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' 
STORED AS TEXTFILE; 

hive> select * from json_tab; 
OK 
NULL null 

我在這裏得到NULL秒。

與HCatalog罐子也試過:

ADD JAR /home/training/Desktop/hcatalog-core-0.11.0.jar; 

CREATE TABLE json_tab(
    DocId string, 
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>> 
) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'; 

但低於錯誤與我create table聲明面臨:

失敗:錯誤的元數據:無法驗證SERDE: org.apache.hive.hcatalog .data.JsonSerDe FAILED:執行錯誤, 從org.apache.hadoop.hive.ql.exec.DDLTask返回代碼1

有人可以幫我嗎?感謝您的幫助提前。

回答

3

可以使用org.openx.data.jsonserde.JsonSerDe類RAD JSON數據

您可以從http://www.congiu.net/hive-json-serde/1.3.6-SNAPSHOT/cdh4/

jar文件,並做以下步驟

add jar /path/to/jar/json-serde-1.3.6-jar-with-dependencies.jar; 

CREATE TABLE json_tab(
    DocId string, 
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>> 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; 

LOAD DATA LOCAL INPATH '/path/to/data/nested.json' INTO TABLE json_tab; 

SELECT DocId, User1.Id, User1.ShippingAddress.City as city, 
User1.Orders[0].ItemId as order0id, 
User1.Orders[1].ItemId as order1id from json_tab; 


result 
ABC  1234 Durham 6789 4352 
+1

謝謝。我試着用建議的依賴關係jar。但是,創建表語句拋出錯誤爲「失敗:執行錯誤,從org.apache.hadoop.hive.ql.exec.DDLTask。org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector返回代碼1 ( Lorg /阿帕奇/ hadoop的/蜂巢/ serde2/objectinspector /原始/ PrimitiveObjectInspectorUtils $ PrimitiveTypeEntry;)V」。你能檢查一下,讓我知道可以做些什麼嗎?我嘗試了CDH3和CDH5。 – user2531569

0
I was getting same exception. 

我加了下面的罐子,它對我很有用。

ADD JAR /home/cloudera/Data/json-serde-1.3.7.3.jar; 
ADD JAR /home/cloudera/Data/hive-hcatalog-core-0.13.0.jar;