我能夠運行this常規文件用Hadoop 2.7.1 我遵循的程序是
- 安裝gradle這個
- 使用gradle這個生成jar文件。我問this問題這幫助我建立依賴於使用Hadoop和往常一樣gradle這個
運行方式運行,我們使用這個命令從那裏JAR所在的文件夾一個java jar文件。
hadoop jar buildSrc-1.0.jar in1 out4
其中in1
是輸入文件,並out4
是HDFS
編輯 -正如上面的鏈接壞了,我在這裏粘貼常規文件輸出文件夾。
import StartsWithCountMapper
import StartsWithCountReducer
import org.apache.hadoop.conf.Configured
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.IntWritable
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.Mapper
import org.apache.hadoop.mapreduce.Reducer
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
import org.apache.hadoop.util.Tool
import org.apache.hadoop.util.ToolRunner
class CountGroovyJob extends Configured implements Tool {
@Override
int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "StartsWithCount")
job.setJarByClass(getClass())
// configure output and input source
TextInputFormat.addInputPath(job, new Path(args[0]))
job.setInputFormatClass(TextInputFormat)
// configure mapper and reducer
job.setMapperClass(StartsWithCountMapper)
job.setCombinerClass(StartsWithCountReducer)
job.setReducerClass(StartsWithCountReducer)
// configure output
TextOutputFormat.setOutputPath(job, new Path(args[1]))
job.setOutputFormatClass(TextOutputFormat)
job.setOutputKeyClass(Text)
job.setOutputValueClass(IntWritable)
return job.waitForCompletion(true) ? 0 : 1
}
static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new CountGroovyJob(), args))
}
class GroovyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable countOne = new IntWritable(1);
private final Text reusableText = new Text();
@Override
protected void map(LongWritable key, Text value, Mapper.Context context) {
value.toString().tokenize().each {
reusableText.set(it)
context.write(reusableText,countOne)
}
}
}
class GroovyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable outValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Reducer.Context context) {
outValue.set(values.collect({it.value}).sum())
context.write(key, outValue);
}
}
}
感謝。大多數投票答案要求我們更新pom.xml,但它在哪裏。我看'/ usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop'。還查看'/ usr/local/Cellar/hadoop/2.7.1/libexec/sbin',但找不到。另外,任何想法如何運行我鏈接到的groovy文件? – user1207289
pom.xml是構建庫的Maven配置文件。這是舊圖書館的作者用於構建網站上可用的罐子的方式。不幸的是,它看起來像他沒有分享他的源代碼(我無法找到它),所以你不能修改該pom.xml並重建/修復它。 – Nicomak