2015-01-08 44 views
1

我fluming二進制對象HDFS和有我的水槽劑和水槽的設置是這樣讀二進制的Avro豬

a1.sinks.k1.type = hdfs 
a1.sinks.k1.channel = c1 
a1.sinks.k1.hdfs.path = /user/%y-%m-%d/%H%M/%S 
a1.sinks.k1.hdfs.filePrefix = events- 
a1.sinks.k1.hdfs.round = true 
a1.sinks.k1.hdfs.roundValue = 10 
a1.sinks.k1.hdfs.roundUnit = minute 

a1.sinks.k1.hdfs.fileType = DataStream 
a1.sinks.k1.hdfs.serializer = avro_event 
a1.sinks.k1.hdfs.serializer.syncIntervalBytes = 4096000 
a1.sinks.k1.hdfs.serializer.compressionCodec = snappy 
a1.sinks.k1.hdfs.serializer.appendNewline = false 
a1.sinks.k1.hdfs.fileSuffix=.avro 
a1.sinks.k1.hdfs.writeFormat=TEXT 

現在我想讀取HDFS文件(something.avro)使用這種

data = LOAD 'something.avro' 
     USING org.apache.pig.piggybank.storage.avro.AvroStorage(); 
dump data; 

我不斷獲取此異常,任何想法,爲什麼我收到該異常或有另一種方式來讀取豬腳本二進制的Avro對象而不提供的Avro架構

Caused by: java.io.IOException: Not a data file. 
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) 
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) 
at org.apache.pig.piggybank.storage.avro.AvroStorageUtils.getSchema(AvroStorageUtils.java:718) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:349) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:277) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:248) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.setInputAvroSchema(AvroStorage.java:226) 
at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:434) 
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) 

回答

0

這裏有同樣的問題,我認爲是因爲我們正在讀取avro二進制數據,它與AVRO文件不同。

你可以試着和使用Avro的工具的fragtojson

java -jar avro-tools-1.7.7.jar fragtojson part0.avro --schema-file schema.avsc

,看看它的工作原理讀取文件!發佈任何發現,如果你設法閱讀它的豬。