2016-09-26 33 views
0

我們都跟着下面的步驟,無法查詢到蜂巢記錄,當存儲爲AVRO格式的數據,返回「error_error ...」異常

  1. 進口表從MySQL到HDFS位置user/hive/warehouse/orders/,表模式作爲

    mysql> describe orders; 
    +-------------------+-------------+------+-----+---------+-------+ 
    | Field    | Type  | Null | Key | Default | Extra | 
    +-------------------+-------------+------+-----+---------+-------+ 
    | order_id   | int(11)  | YES |  | NULL |  | 
    | order_date  | varchar(30) | YES |  | NULL |  | 
    | order_customer_id | int(11)  | YES |  | NULL |  | 
    | order_items  | varchar(30) | YES |  | NULL |  | 
    +-------------------+-------------+------+-----+---------+-------+ 
    
  2. 使用來自相同數據創建外部表在配置單元(1)。

    CREATE EXTERNAL TABLE orders 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
    LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders' 
    TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc'); 
    

    Sqoop命令:

    sqoop import \ 
        --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \ 
        --username=root \ 
        --password=cloudera \ 
        --table orders \ 
        --target-dir /user/hive/warehouse/retail_stage.db/orders \ 
        --as-avrodatafile \ 
        --split-by order_id 
    
  3. 描述格式的命令,返回錯誤,嘗試了很多組合,但失敗了。

    hive> describe orders; 
    OK 
    error_error_error_error_error_error_error string     from deserializer 
    cannot_determine_schema string     from deserializer 
    check     string     from deserializer 
    schema     string     from deserializer 
    url      string     from deserializer 
    and      string     from deserializer 
    literal     string     from deserializer 
    Time taken: 1.15 seconds, Fetched: 7 row(s) 
    

同樣的事情工作了--as-textfile,其中如在--as-avrodatafile情況下拋出錯誤。

引用了一些堆棧溢出但無法解析。任何想法?

回答

0

我認爲應該檢查TBLPROPERTIES中對avro模式文件的引用。

確實如下解決?

HDFS DFS -cat HDFS://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc

我能夠創建確切的情況,並從蜂巢表中選擇。

hive> CREATE EXTERNAL TABLE sqoop_test 
    > COMMENT "A table backed by Avro data with the Avro schema stored in HDFS" 
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  
    > STORED AS 
    > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
> LOCATION '/user/cloudera/categories/'  
> TBLPROPERTIES 
> ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc') 
> ; 

OK 耗時:1.471秒

hive> select * from sqoop_test; 
OK 
1 2 Football 
2 2 Soccer 
3 2 Baseball & Softball