2017-05-27 53 views
0

我想使用嵌套avro架構來創建配置單元表。但它不起作用。我在cdh5.7.2中使用hive 1.1。配置單元不能創建嵌套avro架構表

這裏是我的嵌套的Avro模式:

[ 
    { 
     "type": "record", 
     "name": "Id", 
     "namespace": "com.test.app_list", 
     "doc": "Device ID", 
     "fields": [ 
      { 
       "name": "idType", 
       "type": "int" 
      },{ 
       "name": "id", 
       "type": "string" 
      } 
     ] 
    }, 

    { 
     "type": "record", 
     "name": "AppList", 
     "namespace": "com.test.app_list", 
     "doc": "", 
     "fields": [ 
      { 
       "name": "appId", 
       "type": "string", 
       "avro.java.string": "String" 
      }, 
      { 
       "name": "timestamp", 
       "type": "long" 
      }, 

      { 
       "name": "idList", 
       "type": [{"type": "array", "items": "com.test.app_list.Id"}] 
      } 

     ] 
    } 
] 

而我的SQL創建表:

CREATE EXTERNAL TABLE app_list 
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
TBLPROPERTIES (
'avro.schema.url'='/hive/schema/test_app_list.avsc'); 

但是蜂房給我:從Supports arbitrarily nested schemas.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.avro.AvroSerdeException Schema for table must be of type RECORD. Received type: UNION) 

hive文檔顯示:https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Overview–WorkingwithAvrofromHive

數據樣本:

{ 
    "appId":{"string":"com.test.app"}, 
    "timestamp":{"long":1495893601606}, 
    "idList":{ 
     "array":[ 
      {"idType":15,"id":"6c:5c:14:c3:a5:39"}, 
      {"idType":13,"id":"eb297afe56ff340b6bb7de5c5ab09193"} 
     ] 
    } 

} 

但我不知道怎麼樣。我需要一些幫助來解決這個問題。謝謝!

+0

已經添加了數據樣本。 –

回答

0

您的avro模式的最高級別希望成爲Record Type,這就是爲什麼Hive不允許這樣做。解決方法可以創建最高級別作爲記錄和內部創建兩個字段作爲記錄類型。

{ 
     "type": "record", 
     "name": "myRecord", 
     "namespace": "com.test.app_list" 
      "fields": [ 
    { 
     "type": "record", 
     "name": "Id", 
     "doc": "Device ID", 
     "fields": [ 
      { 
       "name": "idType", 
       "type": "int" 
      },{ 
       "name": "id", 
       "type": "string" 
      } 
     ] 
    }, 

    { 
     "type": "record", 
     "name": "AppList", 
     "doc": "", 
     "fields": [ 
      { 
       "name": "appId", 
       "type": "string", 
       "avro.java.string": "String" 
      }, 
      { 
       "name": "timestamp", 
       "type": "long" 
      }, 

      { 
       "name": "idList", 
       "type": [{"type": "array", "items": "com.test.app_list.Id"}] 
      } 

     ] 
    } 
    ] 
} 
+0

謝謝你。我改變了模式,接着是提示。它的工作原理。 –