我正在編寫一個程序,它接收mapper/reducers的源代碼,動態編譯mappers/reducers並將JAR文件移出它們。然後它必須在hadoop集羣上運行這個JAR文件。從另一個Java程序運行Hadoop作業
對於最後一部分,我通過我的代碼動態設置了所有必需的參數。但是,我現在面臨的問題是,代碼在編譯時需要編譯的mapper和reducer類。但在編譯時,我沒有這些類,它們將在運行時間後被接收(例如,通過從遠程節點接收的消息)。對於如何通過這個問題我有任何想法/建議嗎?
下面你可以找到我的最後一部分的代碼,其中的問題是job.setMapperClass(Mapper_Class.class)和job.setReducerClass(Reducer_Class.class)需要類(Mapper_Class.class和Reducer_Class.class)文件出席編譯時間:
private boolean run_Hadoop_Job(String className){
try{
System.out.println("Starting to run the code on Hadoop...");
String[] argsTemp = { "project_test/input", "project_test/output" };
// create a configuration
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:54310");
conf.set("mapred.job.tracker", "localhost:54311");
conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator
+ className+".jar");
conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
// create a new job based on the configuration
Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
job.setJarByClass(Platform.class);
//job.setMapperClass(Mapper_Class.class);
//job.setReducerClass(Reducer_Class.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path(argsTemp[1]);
fs.delete(out, true);
// finally set the empty out path
FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));
//job.submit();
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("Job Finished!");
} catch (Exception e) { return false; }
return true;
}
修訂:所以我使用conf.set(「mapreduce.map.class,‘我mapper.class’)修改了代碼,以指定映射器和減壓器現在。代碼編譯正確,但執行時會拋出以下錯誤:
ec 24,2012 6:49:43 AM org.apache.hadoop.mapred.JobClien牛逼monitorAndPrintJob 信息:任務標識:attempt_201212240511_0006_m_000001_2,狀態:失敗 了java.lang.RuntimeException:拋出java.lang.ClassNotFoundException:Mapper_Reducer_Classes $ Mapper_Class.class 在org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809 ) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred .MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
您必須將'Hadoop' jar添加到名爲'tmpjars'的屬性。所以它會像這樣工作:'conf.set(「tmpjars」,「/usr/local/hadoop/hadoop-core.jar,/usr/local/hadoop/hadoop-example.jar)'。必須分開Jar路徑以逗號分隔。請注意,這很不方便,您必須注意,這些jar實際上存在於客戶端計算機上(爲了讓Hadoop將它複製到HDFS並將其下載到任務跟蹤器)。 –
謝謝托馬斯。我想出了這部分,我的代碼現在編譯正確。但是在執行期間它會引發一些錯誤。我修改了我的初始帖子以反映這一點。任何想法? – reza
您是否明確將映射器所在的jar添加到'tmpjars'中? –