我是新來的hadoop/java的世界,所以請溫柔,隨時糾正令人震驚的錯誤。 我想在我的ubuntu機器本地運行Hadoop上編譯本機庫(獨立模式)。除了我編譯的.jar之外,我還試圖使用外部的.jar。我試圖讓fatjar失敗,並決定嘗試通過命令行將外部jar和本機庫傳遞給hadoop。這些庫用於我創建的自定義記錄讀取器。我可以通過hadoop命令運行沒有外部庫的mapreduce作業。當我設置LD_LIBRARY_PATH類變量時,我也能夠在eclipse中運行這個程序。我不確定需要設置哪些變量才能在hadoop中成功運行此項工作,因此請告訴我是否有必要,儘管我已嘗試設置$ HADOOP_CLASSPATH。使用外部原生庫(.so)和外部jar與Hadoop MapReduce
即
./bin/hadoop jar ~/myjar/cdf-11-16.jar CdfInputDriver -libjars cdfjava.jar -files libcdf.so,libcdfNativeLibrary.so input output
我試過訪問罐子,從我的地方,以便文件並將它們複製到HDFS。
我從工作得到以下錯誤:
Exception in thread "main" java.lang.NoClassDefFoundError: gsfc/nssdc/cdf/CDFConstants
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:490)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at CdfInputDriver.run(CdfInputDriver.java:45)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at CdfInputDriver.main(CdfInputDriver.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: gsfc.nssdc.cdf.CDFConstants
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 36 more
我試着看文件是否已被緩存載有下面的代碼,並將其「緩存文件」打印爲空:
public class CdfInputDriver extends Configured implements Tool{
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf());
System.out.println("cache files:" + getConf().get("mapreduce.job.cache.files"));
Path[] uris = job.getLocalCacheFiles();
for(Path uri: uris){
System.out.println(uri.toString());
System.out.println(uri.getName());
}
job.setJarByClass(getClass());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setInputFormatClass(CdfInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(CdfMapper.class);
//job.setReducerClass(WordCount.IntSumReducer.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception,
InterruptedException, ClassNotFoundException {
int exitCode = ToolRunner.run(new CdfInputDriver(), args);
System.exit(exitCode);
}
}
此外,我只是測試這不可避免地在亞馬遜EMR上運行的工作。在S3上存儲.so和.jar並使用類似的方法在理論上工作?
感謝任何幫助!