您可以使用兩個Map函數,每個Map函數可以處理一個數據集,如果你想實現不同的處理。你需要在工作中註冊你的地圖。例如:
public static class FullOuterJoinStdDetMapper extends MapReduceBase implements Mapper <LongWritable ,Text ,Text, Text>
{
private String person_name, book_title,file_tag="person_book#";
private String emit_value = new String();
//emit_value = "";
public void map(LongWritable key, Text values, OutputCollector<Text,Text>output, Reporter reporter)
throws IOException
{
String line = values.toString();
try
{
String[] person_detail = line.split(",");
person_name = person_detail[0].trim();
book_title = person_detail[1].trim();
}
catch (ArrayIndexOutOfBoundsException e)
{
person_name = "student name missing";
}
emit_value = file_tag + person_name;
output.collect(new Text(book_title), new Text(emit_value));
}
}
public static class FullOuterJoinResultDetMapper extends MapReduceBase implements Mapper <LongWritable ,Text ,Text, Text>
{
private String author_name, book_title,file_tag="auth_book#";
private String emit_value = new String();
// emit_value =「」; public void map(LongWritable key,Text values,OutputCollectoroutput,Reporter reporter) throws IOException {line} line = values.toString(); 嘗試 String [] author_detail = line.split(「,」); author_name = author_detail [1] .trim(); book_title = author_detail [0] .trim(); } catch(ArrayIndexOutOfBoundsException e) { author_name =「Not Exam in Exam」; }
emit_value = file_tag + author_name;
output.collect(new Text(book_title), new Text(emit_value));
}
}
public static void main(String args[])
throws Exception
{
if(args.length !=3)
{
System.out.println("Input outpur file missing");
System.exit(-1);
}
Configuration conf = new Configuration();
String [] argum = new GenericOptionsParser(conf,args).getRemainingArgs();
conf.set("mapred.textoutputformat.separator", ",");
JobConf mrjob = new JobConf();
mrjob.setJobName("Inner_Join");
mrjob.setJarByClass(FullOuterJoin.class);
MultipleInputs.addInputPath(mrjob,new Path(argum[0]),TextInputFormat.class,FullOuterJoinStdDetMapper.class);
MultipleInputs.addInputPath(mrjob,new Path(argum[1]),TextInputFormat.class,FullOuterJoinResultDetMapper.class);
FileOutputFormat.setOutputPath(mrjob,new Path(args[2]));
mrjob.setReducerClass(FullOuterJoinReducer.class);
mrjob.setOutputKeyClass(Text.class);
mrjob.setOutputValueClass(Text.class);
JobClient.runJob(mrjob);
}
這可能有助於http://stackoverflow.com/questions/4593243/hadoop-job-taking-input-files-from-multiple-directories – Nija 2011-04-11 10:14:22
我會建議你使用任何豬或蜂巢(谷歌爲他們)。然後將其作爲從用戶配置文件到歌曲數據的連接來實現。我也會研究Mahout Hadoop機器學習系統。通過其本地Java API實現Hadoop中的連接真的很煩人。 – 2011-04-11 22:39:26
Thx Spike ... Mahout確實爲SlopeOne預先計算了diff-matrix的實現,但它沒有提供Slopeone算法的完整hadoop版本。無論如何,我會嘗試配置蜂巢。謝謝你的建議 – 2011-04-16 11:57:40