0
我想在維基百科歷史上運行Hadoop作業 - 使用Rubydoop轉儲XML文件。 到目前爲止,我設法加載Cloud⁹的XMLInputFormat Java類和映射成一個Ruby類:Hadoop/Rubydoop +Cloud⁹:找到課程,但期望接口
module Cloud9
require 'java'
require File.expand_path('../../cloud9-1.5.0.jar', __FILE__)
require File.expand_path('../../hadoop-core-1.2.1.jar', __FILE__)
require File.expand_path('../../commons-logging-1.1.1.jar', __FILE__)
java_import 'edu.umd.cloud9.collection.XMLInputFormat'
end
module Wikipedia
class XmlInputFormat < ::Cloud9::XMLInputFormat
end
end
,並添加了XmlInputFormat到Rubydoop配置工作塊:
input input_path, format: Wikipedia::XmlInputFormat
運行作業時,我得到分裂過程後以下錯誤由<page>
和</page>
標籤已經開始:
java.lang.Exception: java.lang.IncompatibleClassChangeError:
Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at edu.umd.cloud9.collection.XMLInputFormat$XMLRecordReader.initialize(XMLInputFormat.java:102)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:521)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
我跑Hadoop 2.1.2本地與cloud9-1.5.0.jar和Rubydoop 1.1.0。
所以問題是:這是因爲Cloud⁹和Rubydoop在本地使用的不兼容的hadoop版本(舊的/新的Hadoop API?)嗎?它怎麼能被修復?