2014-04-17 81 views
2

我已經提交了使用Hadoop的罐子上CDH5貝塔以下命令命令MR作業2無法在hadoop羣集上運行作業。只運行使用LocalJobRunner

hadoop jar ./hadoop-examples-0.0.1-SNAPSHOT.jar com.aravind.learning.hadoop.mapred.join.ReduceSideJoinDriver tech_talks/users.csv tech_talks/ratings.csv tech_talks/output/ReduceSideJoinDriver/ 

我也嘗試提供FS名和作業跟蹤網址如下明確無任何成功

hadoop jar ./hadoop-examples-0.0.1-SNAPSHOT.jar com.aravind.learning.hadoop.mapred.join.ReduceSideJoinDriver -Dfs.default.name=hdfs://abc.com:8020 -Dmapreduce.job.tracker=x.x.x.x:8021 tech_talks/users.csv tech_talks/ratings.csv tech_talks/output/ReduceSideJoinDriver/ 

作業成功運行,但使用不提交到集羣的LocalJobRunner。輸出寫入HDFS並且是正確的。不知道我在這裏做錯了什麼,所以非常感謝您的意見。我也試着明確指定FS和作業服務器下面卻有着相同的結果

14/04/16 20:35:44 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
14/04/16 20:35:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
14/04/16 20:35:45 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 
14/04/16 20:35:45 INFO input.FileInputFormat: Total input paths to process : 2 
14/04/16 20:35:45 INFO mapreduce.JobSubmitter: number of splits:2 
14/04/16 20:35:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1427968352_0001 
14/04/16 20:35:46 WARN conf.Configuration: file:/tmp/hadoop-ird2/mapred/staging/ird21427968352/.staging/job_local1427968352_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 
14/04/16 20:35:46 WARN conf.Configuration: file:/tmp/hadoop-ird2/mapred/staging/ird21427968352/.staging/job_local1427968352_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 
14/04/16 20:35:46 WARN conf.Configuration: file:/tmp/hadoop-ird2/mapred/local/localRunner/ird2/job_local1427968352_0001/job_local1427968352_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 
14/04/16 20:35:46 WARN conf.Configuration: file:/tmp/hadoop-ird2/mapred/local/localRunner/ird2/job_local1427968352_0001/job_local1427968352_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 
14/04/16 20:35:46 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
14/04/16 20:35:46 INFO mapreduce.Job: Running job: job_local1427968352_0001 
14/04/16 20:35:46 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
14/04/16 20:35:46 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 
14/04/16 20:35:46 INFO mapred.LocalJobRunner: Waiting for map tasks 
14/04/16 20:35:46 INFO mapred.LocalJobRunner: Starting task: attempt_local1427968352_0001_m_000000_0 
14/04/16 20:35:46 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
14/04/16 20:35:46 INFO mapred.MapTask: Processing split: hdfs://...:8020/user/ird2/tech_talks/ratings.csv:0+4388258 
14/04/16 20:35:46 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
14/04/16 20:35:46 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
14/04/16 20:35:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
14/04/16 20:35:46 INFO mapred.MapTask: soft limit at 83886080 
14/04/16 20:35:46 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
14/04/16 20:35:46 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
14/04/16 20:35:47 INFO mapreduce.Job: Job job_local1427968352_0001 running in uber mode : false 
14/04/16 20:35:47 INFO mapreduce.Job: map 0% reduce 0% 
14/04/16 20:35:48 INFO mapred.LocalJobRunner: 
14/04/16 20:35:48 INFO mapred.MapTask: Starting flush of map output 
14/04/16 20:35:48 INFO mapred.MapTask: Spilling map output 
14/04/16 20:35:48 INFO mapred.MapTask: bufstart = 0; bufend = 6485388; bufvoid = 104857600 
14/04/16 20:35:48 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 24860980(99443920); length = 1353417/6553600 
14/04/16 20:35:49 INFO mapred.MapTask: Finished spill 0 
14/04/16 20:35:49 INFO mapred.Task: Task:attempt_local1427968352_0001_m_000000_0 is done. And is in the process of committing 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: map 
14/04/16 20:35:49 INFO mapred.Task: Task 'attempt_local1427968352_0001_m_000000_0' done. 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local1427968352_0001_m_000000_0 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: Starting task: attempt_local1427968352_0001_m_000001_0 
14/04/16 20:35:49 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
14/04/16 20:35:49 INFO mapred.MapTask: Processing split: hdfs://...:8020/user/ird2/tech_talks/users.csv:0+186304 
14/04/16 20:35:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
14/04/16 20:35:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
14/04/16 20:35:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
14/04/16 20:35:49 INFO mapred.MapTask: soft limit at 83886080 
14/04/16 20:35:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
14/04/16 20:35:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: 
14/04/16 20:35:49 INFO mapred.MapTask: Starting flush of map output 
14/04/16 20:35:49 INFO mapred.MapTask: Spilling map output 
14/04/16 20:35:49 INFO mapred.MapTask: bufstart = 0; bufend = 209667; bufvoid = 104857600 
14/04/16 20:35:49 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26192144(104768576); length = 22253/6553600 
14/04/16 20:35:49 INFO mapred.MapTask: Finished spill 0 
14/04/16 20:35:49 INFO mapred.Task: Task:attempt_local1427968352_0001_m_000001_0 is done. And is in the process of committing 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: map 
14/04/16 20:35:49 INFO mapred.Task: Task 'attempt_local1427968352_0001_m_000001_0' done. 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local1427968352_0001_m_000001_0 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: map task executor complete. 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: Waiting for reduce tasks 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: Starting task: attempt_local1427968352_0001_r_000000_0 
14/04/16 20:35:49 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
14/04/16 20:35:49 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: [email protected] 
14/04/16 20:35:49 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=652528832, maxSingleShuffleLimit=163132208, mergeThreshold=430669056, ioSortFactor=10, memToMemMergeOutputsThreshold=10 
14/04/16 20:35:49 INFO reduce.EventFetcher: attempt_local1427968352_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 
14/04/16 20:35:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1427968352_0001_m_000001_0 decomp: 220797 len: 220801 to MEMORY 
14/04/16 20:35:49 INFO reduce.InMemoryMapOutput: Read 220797 bytes from map-output for attempt_local1427968352_0001_m_000001_0 
14/04/16 20:35:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 220797, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->220797 
14/04/16 20:35:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1427968352_0001_m_000000_0 decomp: 7162100 len: 7162104 to MEMORY 
14/04/16 20:35:49 INFO reduce.InMemoryMapOutput: Read 7162100 bytes from map-output for attempt_local1427968352_0001_m_000000_0 
14/04/16 20:35:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 7162100, inMemoryMapOutputs.size() -> 2, commitMemory -> 220797, usedMemory ->7382897 
14/04/16 20:35:49 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 
14/04/16 20:35:49 INFO mapred.LocalJobRunner: 2/2 copied. 
14/04/16 20:35:49 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs 
14/04/16 20:35:49 INFO mapred.Merger: Merging 2 sorted segments 
14/04/16 20:35:49 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 7382885 bytes 
14/04/16 20:35:50 INFO reduce.MergeManagerImpl: Merged 2 segments, 7382897 bytes to disk to satisfy reduce memory limit 
14/04/16 20:35:50 INFO reduce.MergeManagerImpl: Merging 1 files, 7382899 bytes from disk 
14/04/16 20:35:50 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 
14/04/16 20:35:50 INFO mapred.Merger: Merging 1 sorted segments 
14/04/16 20:35:50 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 7382889 bytes 
14/04/16 20:35:50 INFO mapred.LocalJobRunner: 2/2 copied. 
14/04/16 20:35:50 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 
14/04/16 20:35:50 INFO mapreduce.Job: map 100% reduce 0% 
14/04/16 20:35:51 INFO mapred.Task: Task:attempt_local1427968352_0001_r_000000_0 is done. And is in the process of committing 
14/04/16 20:35:51 INFO mapred.LocalJobRunner: 2/2 copied. 
14/04/16 20:35:51 INFO mapred.Task: Task attempt_local1427968352_0001_r_000000_0 is allowed to commit now 
14/04/16 20:35:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1427968352_0001_r_000000_0' to hdfs://...:8020/user/ird2/tech_talks/output/ReduceSideJoinDriver/_temporary/0/task_local1427968352_0001_r_000000 
14/04/16 20:35:51 INFO mapred.LocalJobRunner: reduce > reduce 
14/04/16 20:35:51 INFO mapred.Task: Task 'attempt_local1427968352_0001_r_000000_0' done. 
14/04/16 20:35:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1427968352_0001_r_000000_0 
14/04/16 20:35:51 INFO mapred.LocalJobRunner: reduce task executor complete. 
14/04/16 20:35:52 INFO mapreduce.Job: map 100% reduce 100% 
14/04/16 20:35:52 INFO mapreduce.Job: Job job_local1427968352_0001 completed successfully 
14/04/16 20:35:52 INFO mapreduce.Job: Counters: 38 
     File System Counters 
       FILE: Number of bytes read=14767932 
       FILE: Number of bytes written=29952985 
       FILE: Number of read operations=0 
       FILE: Number of large read operations=0 
       FILE: Number of write operations=0 
       HDFS: Number of bytes read=13537382 
       HDFS: Number of bytes written=2949787 
       HDFS: Number of read operations=28 
       HDFS: Number of large read operations=0 
       HDFS: Number of write operations=5 
     Map-Reduce Framework 
       Map input records=343919 
       Map output records=343919 
       Map output bytes=6695055 
       Map output materialized bytes=7382905 
       Input split bytes=272 
       Combine input records=0 
       Combine output records=0 
       Reduce input groups=5564 
       Reduce shuffle bytes=7382905 
       Reduce input records=343919 
       Reduce output records=5564 
       Spilled Records=687838 
       Shuffled Maps =2 
       Failed Shuffles=0 
       Merged Map outputs=2 
       GC time elapsed (ms)=92 
       CPU time spent (ms)=0 
       Physical memory (bytes) snapshot=0 
       Virtual memory (bytes) snapshot=0 
       Total committed heap usage (bytes)=1416101888 
     Shuffle Errors 
       BAD_ID=0 
       CONNECTION=0 
       IO_ERROR=0 
       WRONG_LENGTH=0 
       WRONG_MAP=0 
       WRONG_REDUCE=0 
     File Input Format Counters 
       Bytes Read=4574562 
     File Output Format Counters 
       Bytes Written=2949787 

驅動程序代碼

public class ReduceSideJoinDriver extends Configured implements Tool 
{ 
    @Override 
    public int run(String[] args) throws Exception 
    { 
     if (args.length != 3) 
     { 
      System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName()); 
      ToolRunner.printGenericCommandUsage(System.err); 
      return -1; 
     } 

     Path usersFile = new Path(args[0]); 
     Path ratingsFile = new Path(args[1]); 

     Job job = Job.getInstance(getConf(), "Aravind - Reduce Side Join"); 

     job.getConfiguration().setStrings(usersFile.getName(), "user"); 
     job.getConfiguration().setStrings(ratingsFile.getName(), "rating"); 

     job.setInputFormatClass(TextInputFormat.class); 
     job.setOutputFormatClass(TextOutputFormat.class); 
     job.setMapOutputKeyClass(IntWritable.class); 
     job.setMapOutputValueClass(TagAndRecord.class); 

     TextInputFormat.addInputPath(job, usersFile); 
     TextInputFormat.addInputPath(job, ratingsFile); 

     TextOutputFormat.setOutputPath(job, new Path(args[2])); 

     job.setMapperClass(ReduceSideJoinMapper.class); 
     job.setReducerClass(ReduceSideJoinReducer.class); 
     job.setOutputKeyClass(IntWritable.class); 
     job.setOutputValueClass(Text.class); 

     return job.waitForCompletion(true) ? 0 : 1; 
    } 

    public static void main(String args[]) throws Exception 
    { 
     int exitCode = ToolRunner.run(new Configuration(), new ReduceSideJoinDriver(), args); 
     System.exit(exitCode); 
    } 
} 
+0

你可以附上你的工作驅動程序類代碼嗎? – Rocky111

+0

@ Rocky111添加了驅動程序類代碼 –

回答

0

顯然,您只能從指定爲網關節點的節點提交hadoop作業。一旦我從網關節點提交作業,一切正常。

1

請確保您有在Hadoop中的類路徑有效後面的配置文件。默認情況下,配置文件來自目錄/ etc/hadoop/conf。該活動應該執行hadoop客戶端節點設置的一部分。

mapred-site.xml 
yarn-site.xml 
core-site.xml 

如果上述配置文件爲空。你必須使用正確的屬性來混淆上述文件。人口可以按兩種方式

在Cloudera的經理當點擊服務紗,在動作部分,有一個選項Deploy client configuration具有啓動以來,停止等。使用該選項部署客戶端配置來實現。

如果節點不是由CM管理,並且沒有在節點上配置紗網關,有時以上選項可能無法工作。使用選項Download client configuration而不是部署客戶端配置。提取下載的zip配置文件(上面的文件)並手動將這些文件複製到位置/ etc/hadoop/conf。

用於執行罐子可以使用hadoopyarn