2011-07-11 54 views
8

我目前正試圖弄清楚當你運行MapReduce作業時,通過在代碼上的某些地方製作一些system.out.println()會發生什麼情況,但知道那些打印語句當作業運行時在我的終端上打印。有人能幫我弄清楚我到底做錯了什麼。MapReduce作業沒有在終端上顯示我的打印語句

import java.io.IOException; 
import java.util.StringTokenizer; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.InputSplit; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.OutputCommitter; 
import org.apache.hadoop.mapreduce.RecordReader; 
import org.apache.hadoop.mapreduce.RecordWriter; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.StatusReporter; 
import org.apache.hadoop.mapreduce.TaskAttemptID; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 

public class WordCountJob { 
    public static int iterations; 
    public static class TokenizerMapper 
    extends Mapper<Object, Text, Text, IntWritable>{ 

private final static IntWritable one = new IntWritable(1); 
private Text word = new Text(); 
@Override 
public void map(Object key, Text value, Context context 
       ) throws IOException, InterruptedException { 
    System.out.println("blalblbfbbfbbbgghghghghghgh"); 
    StringTokenizer itr = new StringTokenizer(value.toString()); 
    while (itr.hasMoreTokens()) { 
    word.set(itr.nextToken()); 
    String myWord = itr.nextToken(); 
    int n = 0; 
    while(n< 5){ 
     myWord = myWord+ "Test my appending words"; 
     n++; 
    } 
    System.out.println("Print my word: "+myWord); 
    word.set(myWord); 
    context.write(word, one); 
    } 
} 
} 

public static class IntSumReducer 
    extends Reducer<Text,IntWritable,Text,IntWritable> { 
private IntWritable result = new IntWritable(); 

public void reduce(Text key, Iterable<IntWritable> values, 
        Context context 
        ) throws IOException, InterruptedException { 
    int sum = 0; 
    for (IntWritable val : values) { 
    sum += val.get(); 
    } 
    result.set(sum); 
    context.write(key, result); 
    } 
} 

public static void main(String[] args) throws Exception { 
Configuration conf = new Configuration(); 
TaskAttemptID taskid = new TaskAttemptID(); 
TokenizerMapper my = new TokenizerMapper(); 

if (args.length != 3) { 
    System.err.println("Usage: WordCountJob <in> <out> <iterations>"); 
    System.exit(2); 
} 
iterations = new Integer(args[2]); 
Path inPath = new Path(args[0]); 
Path outPath = null; 
for (int i = 0; i<iterations; ++i){ 
    System.out.println("Iteration number: "+i); 
    outPath = new Path(args[1]+i); 
    Job job = new Job(conf, "WordCountJob"); 
    job.setJarByClass(WordCountJob.class); 
    job.setMapperClass(TokenizerMapper.class); 
    job.setCombinerClass(IntSumReducer.class); 
    job.setReducerClass(IntSumReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, inPath); 
    FileOutputFormat.setOutputPath(job, outPath); 
    job.waitForCompletion(true); 
    inPath = outPath; 
    } 
} 
} 

回答

20

這取決於你如何提交你的工作,我想你使用bin/hadoop jar yourJar.jar來提交它吧?

您的System.out.println()僅在您的主要方法中可用,這是因爲mapper/reducer是在不同JVM中的hadoop內部執行的,所有輸出都被重定向到特殊日誌文件(out/log-files)。 我會建議使用自己的Apache的公地使用日誌:

Log log = LogFactory.getLog(YOUR_MAPPER_CLASS.class) 

,因此也一些信息記錄:

log.info("Your message"); 

如果您在「本地」 - 模式是,那麼你可以看到這個日誌在你的shell中,否則這個日誌將被存儲在任務執行的機器上的某個地方。請使用jobtracker的Web UI來查看這些日誌文件,這非常方便。默認情況下,作業跟蹤器在端口50030上運行。

+0

小修改/建議Slf4j現在似乎更常見,因爲靜態綁定 – jayunit100

+2

@jayunit100是的。關於commons logging的一件很酷的事情是,由於Hadoop也使用它,所以jar已經存在。對於SLF4J,必須通過libjars添加。 –

1

或者,您可以使用MultipleOutputs類並將所有日誌數據重定向到一個輸出文件(日誌)中。

MultipleOutputs<Text, Text> mos = new MultipleOutputs<Text, Text>(context); 
Text tKey = new Text("key"); 
Text tVal = new Text("log message"); 
mos.write(tKey, tVal, <lOG_FILE>); 
相關問題