2015-02-05 66 views
0

我已經創建了一個表中的HBase採用蜂巢:將數據插入到HBase的使用蜂巢(JSON文件)

hive> CREATE TABLE hbase_table_emp(id int, name string, role string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role") 
TBLPROPERTIES ("hbase.table.name" = "emp"); 

,並創建了另一個表上加載數據:

hive> create table testemp(id int, name string, role string) row format delimited fields terminated by '\t'; 
hive> load data local inpath '/home/user/sample.txt' into table testemp; 

最後將數據插入到HBase的表:

hive> insert overwrite table hbase_table_emp select * from testemp; 
hive> select * from hbase_table_emp; 
OK 
123 Ram  TeamLead 
456 Silva Member 
789 Krishna Member 
time taken: 0.160 seconds, Fetched: 3 row(s) 

表看起來像這樣在HBase的:

hbase(main):002:0> scan 'emp' 
ROW     COLUMN+CELL            
123     column=cf1:name, timestamp=1422540225254, value=Ram  
123     column=cf1:role, timestamp=1422540225254, value=TeamLead 
456     column=cf1:name, timestamp=1422540225254, value=Silva  
456     column=cf1:role, timestamp=1422540225254, value=Member  
789     column=cf1:name, timestamp=1422540225254, value=Krishna  
789     column=cf1:role, timestamp=1422540225254, value=Member  
3 row(s) in 2.1230 seconds 

我可以做同樣的JSON文件:

{"id": 123, "name": "Ram", "role":"TeamLead"} 
{"id": 456, "name": "Silva", "role":"Member"} 
{"id": 789, "name": "Krishna", "role":"Member"} 

做:

hive> load data local inpath '/home/user/sample.json' into table testemp; 

請幫幫忙! :)

回答

2

您可以使用get_json_object函數將數據解析爲JSON對象。例如,如果您創建一個臨時表與您的JSON數據:

DROP TABLE IF EXISTS staging; 
CREATE TABLE staging (json STRING); 
LOAD DATA LOCAL INPATH '/local/path/to/jsonfile' INTO TABLE staging; 

然後使用get_json_object提取要裝載到表的屬性:

INSERT OVERWRITE TABLE hbase_table_emp SELECT 
    get_json_object(json, "$.id") AS id, 
    get_json_object(json, "$.name") AS name, 
    get_json_object(json, "$.role") AS role 
FROM staging; 

有這種較爲全面的探討功能here

+0

謝謝你的幫助:) – 2015-02-06 09:24:35