2014-03-28 229 views
0

我存儲在蜂巢蜂巢查詢嵌套JSON

{"key":123,"c1":["s1","s2","s3"],"c2":{"k1":"v1","k2":"v2"}} 
{"key":456,"c1":["s4","s5","s6"],"c2":{"k3":"v3","k4":"v4"}} 

以下JSON現在我想給蜂房JSON的查詢,所以我會得到下面的輸出 輸出:

key c1 c1 c1 c2 c2 c2 c2 123 s1 s2 s3 k1 v1 k2 v2 456 s4 s5 s6 k3 v3 k4 v4

那麼如何有可能在蜂巢或我錯過我的輸出結構?

+0

什麼create table語句,你試過嗎? –

+0

@MukeshS我用mongo hive轉換使用下面的鏈接 https://github.com/mongodb/mongo-hadoop/tree/master/hive 所以我在蜂巢中使用了我的mongo douments。 – Yogesh

+0

嗯,我很抱歉,但我沒有在mongodb上工作,所以幫不了你。我以爲你只用Hive和Json。 –

回答

0

您可以使用Brickhouse JSON UDFS(http://github.com/klout/brickhouse)將JSON解析爲Hive結構,然後訪問這些值。

SELECT strct.key, 
     strct.c1[ 0 ], strct.c1[1], strct.c1[2], 
     map_keys(strct.c2)[ 0 ], map_values(strct.c2)[0], 
     map_keys(strct.c2)[ 1 ], map_values(strct.c2)[1] 
FROM (
    SELECT from_json(json_str, 
     named_struct("key", 0, "c1", array(""), "c2", map("",""))) as strict 
    FROM json_table 
) js; 

閱讀Brickhouse confessions博客文章,瞭解更多信息,網址爲http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/

-1

發佈端到端解決方案。通過步驟的過程步驟來轉換JSON蜂巢表:

步驟1)如果不存在已經

>$ sudo apt-get install maven

步驟2安裝行家)如果不存在已經

>sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git

安裝GIT中

步驟3)進入$ HOME/HIVE-JSON_Serde文件夾

步驟4)構建serd È包

>sudo mvn -Pcdh5 clean package

步驟5)SERDE文件將是 $ HOME /蜂房JSON-SERDE/JSON-SERDE /目標/ JSON-SERDE-1.3.7-快照JAR-與-dependencies.jar

步驟6)添加作爲SERDE相關性JAR在蜂巢

hive> ADD JAR $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7- SNAPSHOT-jar-with-dependencies.jar; 

步驟7)創建於$ HOME /書籍JSON文件。在蜂房

hive>CREATE TABLE tmp1 (
     value ARRAY<struct<id:string,bookname:string,properties:struct<subscription:string,unit:string>>> 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
    'mapping.value' = 'value' 
) 
STORED AS TEXTFILE; 

步驟9)JSON(實施例)

{"value": [{"id": "1","bookname": "A","properties": {"subscription": "1year","unit": "3"}},{"id": "2","bookname":"B","properties":{"subscription": "2years","unit": "5"}}]} 

步驟8)創建TMP1表從JSON將數據加載到TMP1表

>LOAD DATA LOCAL INPATH '$HOME/books.json' INTO TABLE tmp1; 

步驟10)創建TMP2表做tmp1的爆炸操作表單,這個中間步驟是將多級json結構分解成多行 注意:如果你的JSON結構簡單單層,則避免這一步

hive>create table tmp2 as 
SELECT * 
FROM tmp1 
LATERAL VIEW explode(value) itemTable AS items; 

步驟11)創建配置單元表和從TMP2表加載值

hive>create table books as 
select value[0].id as id, value[0].bookname as name, value[0].properties.subscription as subscription, value[0].properties.unit as unit from tmp2; 

步驟12)下降TMP表

hive>drop table tmp1; 
hive>drop table tmp2; 

步驟13)測試蜂房表

hive>select * from books; 

輸出:

ID名稱認購單位

1B中1年3

2 B2年5