2015-07-02 86 views
0

樣品JSON文件是下面提到的,我需要的版本,覆寫投放,場景,repairType,等級和notificationType需要蜂巢查詢解析

請建議蜂房查詢,而無需增加任何新的jar

{ 
    "channelOutcome": { 
     "MG": { 
      "repairStrategies": [ 
       { 
        "scenario": "1", 
        "repairType": "ISR", 
        "rank": 1, 
        "notificationType": "Z5" 
       }, 
       { 
        "scenario": "1", 
        "repairType": "SER", 
        "rank": 2, 
        "notificationType": "NO" 
       }, 
       { 
        "scenario": "1", 
        "repairType": "ACC", 
        "rank": 3, 
        "notificationType": "Z5" 
       }, 
       { 
        "scenario": "1", 
        "repairType": "SWP", 
        "rank": 4, 
        "notificationType": "Z5" 
       }, 
       { 
        "scenario": "4", 
        "repairType": "RMS", 
        "rank": 5, 
        "notificationType": "Z8" 
       } 
      ], 
      "overrideable": false 
     } 
    }, 
    "keyValues": [], 
    "version": 2.3 
    } 
+0

通過添加外部https://github.com/rcongiu/Hive-JSON-Serde,這可以在簡單中實現。請參考這些使用外部jar很容易解決的例子http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.html –

回答

0
JSON文件

查詢::無需使用外部jar文件:)

Select 
    three.version, 
    three.overrideable, 
    get_json_object(three.strategy,'$.scenario') as scenario, 
    get_json_object(three.strategy,'$.repairType') as repairType, 
    get_json_object(three.strategy,'$.rank') as rank , 
    get_json_object(three.strategy,'$.notificationType') as notificationType 
FROM 
(
select s.version,s.overrideable,strategy 
FROM 
(
    select two.version as version, 
     two.overrideable as overrideable , 
     split(two.repairStrategies,"\\|") as rs_array 
    FROM 
    (
    select one.version, 
      one.overrideable as overrideable, 
      regexp_replace(regexp_replace(one.repairStrategies,'\\[|\\]',''),'\\}\\,\\{','\\}\\|\\{') as repairStrategies 
    FROM (
      Select get_json_object(helper_json.line,'$.version') as version, 
       get_json_object(helper_json.line,'$.channelOutcome.MG.overrideable') as overrideable , 
       get_json_object(helper_json.line,'$.channelOutcome.MG.repairStrategies') as repairStrategies 
      FROM helper_json 
    )one 
) two 
) s LATERAL VIEW explode(s.rs_array) s AS strategy 
) three; 

其中helper_json具有以下模式。

hive (vijay)> describe helper_json; 
OK 
line     string     None 
Time taken: 0.056 seconds, Fetched: 1 row(s) 
hive (vijay)> select * from helper_json; 
OK 
{"channelOutcome":{"MG":{"repairStrategies":[{"scenario":"1","repairType":"ISR","rank":1,"notificationType":"Z5"},{"scenario":"1","repairType":"SER","rank":2,"notificationType":"NO"},{"scenario":"1","repairType":"ACC","rank":3,"notificationType":"Z5"},{"scenario":"1","repairType":"SWP","rank":4,"notificationType":"Z5"},{"scenario":"4","repairType":"RMS","rank":5,"notificationType":"Z8"}],"overrideable":false}},"keyValues":[],"version":2.3} 
Time taken: 0.144 seconds, Fetched: 1 row(s) 
hive (vijay)> 

輸出::增加輸出的輸出看起來像什麼更多的瞭解。

Total MapReduce jobs = 1 
Launching Job 1 out of 1 
Number of reduce tasks is set to 0 since there's no reduce operator 
Starting Job = job_201503240233_5513, Tracking URL = http://dragon1:50030/jobdetails.jsp?jobid=job_201503250213_4613 
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201503240233_5513 
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 
2015-07-04 05:06:51,144 Stage-1 map = 0%, reduce = 0% 
2015-07-04 05:06:56,178 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec 
2015-07-04 05:06:57,184 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.5 sec 
2015-07-04 05:06:58,191 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.5 sec 
MapReduce Total cumulative CPU time: 1 seconds 500 msec 
Ended Job = job_201503250213_4613 
MapReduce Jobs Launched: 
Job 0: Map: 1 Cumulative CPU: 1.5 sec HDFS Read: 667 HDFS Write: 105 SUCCESS 
Total MapReduce CPU Time Spent: 1 seconds 500 msec 
OK 
version overrideable scenario  repairtype  rank notificationtype 
2.3  false 1  ISR  1  Z5 
2.3  false 1  SER  2  NO 
2.3  false 1  ACC  3  Z5 
2.3  false 1  SWP  4  Z5 
2.3  false 4  RMS  5  Z8 
Time taken: 15.831 seconds, Fetched: 5 row(s) 
+0

謝謝Vijay,這是我的關注的完美解決方案。 –