2017-06-12 18 views
0

我試圖存儲數據AVRO格式,但無法實現爲什麼我得到錯誤。基準2不在聯合[「null」,「string」]這是什麼意思?PIG腳本錯誤,雖然試圖以AVRO格式存儲:基準2不在聯合[「空」,「字符串」]

解析XML:

REGISTER piggybank.jar 
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/avro.jar 
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/json-simple-1.1.jar 
REGISTER /opt/cloudera/parcels/CDH/lib/pig/lib/snappy-java-1.0.4.1.jar 
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath() 
DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader('data') 
A = LOAD 'input' using XMLLoader as (x:chararray); 
R = RANK A; 
B = FOREACH R GENERATE 
     $0 as (id:chararray), 
     ToString(CurrentTime(),'yyyy-MM-dd HH:mm:ss.SSSSSS') as (CreatedTS:chararray), 
     CONCAT((chararray)XPath(x, 'data/key'), '_', ToString(CurrentTime(), 'yyMMddHmm'),'_',(chararray)$0) as (FileName:chararray), 
     XPath(x, 'data/title') as (title:chararray), 
     XPath(x, 'data/city') as (city:chararray), 
     XPath(x, 'data/country') as (country:chararray), 
     XPath(x, 'data/text') as (text:chararray), 
     XPath(x, 'data/empty_text') as (empty_text:chararray); 
C = DISTINCT B; 
DUMP C; 

DUMP C:

(2,2017-06-12 14:21:35.937000,f385a4_1706121421_2,Data text for two,תל אביב -יפו,IL,תל אביב -יפו, מחוז תל אביב,) 
(3,2017-06-12 14:21:35.937000,657e21_1706121421_3,Data text three,תל אביב -יפו,IL,תל אביב -יפו,) 
(4,2017-06-12 14:21:35.937000,5700da_1706121421_4,Data text four,Dublin,IE,Text data for example,) 
(1,2017-06-12 14:21:35.937000,22bafc_1706121421_1,Data text one,Letterkenny,IE,Text data for example,) 

商店:

STORE C INTO 'output' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); 

2017-06-12 14:57:19,857 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1496327466789_0452_r_000000_3 Info:Error: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum 2 is not in union ["null","string"] 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) 
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) 
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum 2 is not in union ["null","string"] 
    at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) 
    at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) 
    at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:749) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) 
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558) 
    at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) 
    at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467) 
    ... 11 more 
Caused by: java.lang.RuntimeException: Datum 2 is not in union ["null","string"] 
    at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnionSchema(PigAvroDatumWriter.java:128) 
    at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111) 
    at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82) 
    at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:365) 
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105) 
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) 
    at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) 
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60) 
    at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302) 
    ... 19 more 

我想明確地定義AVRO模式,但沒有得到任何的成功,同樣的錯誤

STORE C INTO 'output' USING org.apache.pig.piggybank.storage.avro.AvroStorage(
'schema', '{ 
    "name": "Myschema", 
    "type": "record", 
    "fields": 
    [ 
     {"name": "id", "type": ["string", "null"]}, 
     {"name": "FileName", "type": ["string", "null"]}, 
     {"name": "title", "type": ["string", "null"]}, 
     {"name": "city", "type": ["string", "null"]}, 
     {"name": "country", "type": ["string", "null"]}, 
     {"name": "text", "type": ["string", "null"]}, 
     {"name": "empty_text", "type": ["string", "null"]} 
    ] 
    }' 
); 

所有字段的類型 - Srting(chararray),因爲我從XML得到了他們。

DESCRIBE C:

DESCRIBE C; 

C: {id: chararray,CreatedTS: chararray,FileName: chararray,title: chararray,city: chararray,country: chararray,text: chararray,empty_text: chararray} 

我找不到除了這個在互聯網上的任何信息: http://www.gauravp.com/2014/06/pig-error-error-2997-encountered.html PLS,可有人解釋一下嗎?

回答

0

通過在PIG中隱式轉換來解決,(chararray)$ 0 as(id:chararray)。 RANK返回長整型 - 是一個根本原因。所以總是嘗試檢查類型和返回類型,例如從udf。