2016-10-24 62 views
0

JSON數據嵌套的Json負載,我有是:在Apache的豬

{"time": "2015-06-30T23:00:00Z", 
    "type": "analysis", 
    "revision": "0.8", 
    "hostname": "iem6.local", 
    "data": [ 
    {"gid": 1, "tmpc": 28.00, "wawa": [""], "ptype": 10, "dwpc": 17.40, "smps": 6.2, "drct": 99, "vsby": 16.093, "roadtmpc": 39.10,"srad": 77.61, "snwd": 0.00, "pcpn": 0.00}, 
{"gid": 213840, "tmpc": 22.00, "wawa": [""], "ptype": 10, "dwpc": 13.70, "smps": 5.7, "drct": 350, "vsby": 16.093, "roadtmpc": 32.70,"srad": 249.50, "snwd": 0.00, "pcpn": 0.00}]} 

我試圖加載使用Apache豬的Json裝載機數據。

data_raw = LOAD '205006.json' using JsonLoader('time:chararray,type:chararray,revision:chararray,hostname:chararray,data:(gid:int,tmpc:float,wawa:{(a:chararray)},ptype:int,dwpc:float)'); 

但是,轉儲結果時給出的輸出不正確。

(2015-06-30T23:00:00Z,,,,) 
(,,,,) 
(,,,,) 
(,,,,) 
(,,,,) 
(1,28.00,[,],) 
(2,28.00,[,],) 

拋出的警告

2016-10-24 15:43:55,852 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {"time": "2015-06-30T23:00:00Z", 
2016-10-24 15:43:55,871 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record  "type": "analysis", 
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record  "revision": "0.8", 
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record  "hostname": "iem6.local", 
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record  "data": [ 
2016-10-24 15:43:55,872 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad tuple field, could not find start of object, field 4 
2016-10-24 15:43:55,873 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find end of record  {"gid": 1, "tmpc": 28.00, "wawa": [""], "ptype": 10, "dwpc": 17.40, "smps": 6.2, "drct": 99, "vsby": 16.093, "roadtmpc": 39.10,"srad": 77.61, "snwd": 0.00, "pcpn": 0.00}, 
2016-10-24 15:43:55,873 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad tuple field, could not find start of object, field 4 

我不能使用象鳥這一點。

+0

可以請您發佈完整的json。 可能是你的json無效(意思是你缺少大括號或方括號)。你可以使用http://jsonlint.com/ –

+0

來檢查你的json有效性。我編輯了示例JSON數據,並將第一個和最後一個數據點作爲示例 – Sumit

+0

是多行還是單行json? –

回答

0

首先,你應該加入你的json到同一行。請記住每行有一個json對象。

所有的二,請使用以下命令豬:

data_raw = LOAD '205006.json' using JsonLoader('time:chararray,type:chararray,revision:chararray,hostname:chararray,data:{(gid:int,tmpc:float,wawa:{(chararray)},ptype:int, dwpc:float, smps:float, drct:int, vsby:float, roadtmpc:float, srad: float, snwd:float, pcpn:float)}');

你應該爲了描述JSON字符串的所有領域。