2017-04-06 27 views
1

我試圖存儲一個java對RDD作爲一個Hadoop序列文件如下串行:星火saveAsNewAPIHadoopFile產生java.io.IOException:找不到值類

JavaPairRDD<ImmutableBytesWritable, Put> putRdd = ... 
config.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization,org.apache.hadoop.io.serializer.WritableSerialization"); 
putRdd.saveAsNewAPIHadoopFile(outputPath, ImmutableBytesWritable.class, Put.class, SequenceFileOutputFormat.class, config); 

但我得到的異常即使我設置io.serializations

2017-04-06 14:39:32,623 ERROR [Executor task launch worker-0] executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) 
java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Put'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization. 
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1192) 
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1094) 
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273) 
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530) 
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64) 
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1030) 
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
    at org.apache.spark.scheduler.Task.run(Task.scala:88) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
2017-04-06 14:39:32,669 ERROR [task-result-getter-0] scheduler.TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job 

我如何能解決這個任何想法?

+0

你寫什麼的HBase樣的數據? – Vidya

+0

謝謝@Vidya我已經找到修復並在 – bachr

回答

1

我發現修復,顯然Put(和所有的HBase突變)都有一個特定的序列號MutationSerialization

下面這行修復該問題:

config.setStrings("io.serializations", 
    config.get("io.serializations"), 
    MutationSerialization.class.getName(), 
    ResultSerialization.class.getName()); 
+0

以下共享我遇到了一個非常相似的情況,但是我的類型是:'JavaPairRDD ',使用上面的類沒有幫助,任何想法哪一個我應該使用? 'Result'從'org.apache.hadoop.hbase.client.Result'中導入。 – FisherCoder

+0

'ResultSerialization'應該足夠了,但如果我嘗試執行'putRdd.first()'或'putRdd.collect()',我仍然會看到一個spark序列化異常。在我的情況下,我只想存儲到HDFS或返回HBase,上面的代碼就足夠了。 – bachr

相關問題