實施例的重複計數:(用戶,(logincount,commentcount))Hadoop的:一個值(爪哇)的(鍵,值)的映射器
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String tempString = value.toString();
String[] stringData = tempString.split(",");
String user = stringData[2];
String activity = stringData[1];
if (activity.matches("login")) {
outCount.set(1,0);
}
if (activity.matches("comment")) {
outCount.set(0,1);
}
outUserID.set(userID);
context.write(outUserID, outCount);
}
我計數登錄的用戶的&註釋。現在我想改變計數:如果用戶發表評論,請檢查每個登錄&。 我怎麼能實現我的映射器或減速器只搜索用戶的一個評論和「忽略」(該用戶的)所有其他評論?
編輯:
日誌文件:
User14133 Logins: 2 Comments: 2
User76892 Logins: 1 Comments: 0
輸入:
Mapper<LongWritable, Text, Text, UserCount>
Reducer<Text, UserCount, Text, UserCount>
public static class UserCount implements Writable {
public UserCountTuple() {
set(new IntWritable(0), new IntWritable(0));
}
我mapred此刻
2013-01-01T16:50:56.056+0100,login,User14133,somedata,somedata
2013-01-01T16:55:56.056+0100,login,User14133,somedata,somedata
2013-01-01T05:20:44.044+0100,comment,User14133,somedata,somedata,{text: "something here"}
2013-01-01T05:24:44.044+0100,comment,User14133,somedata,somedata,{text: "something here"}
2013-01-01T20:50:13.013+0100,login,User76892,somedata,somedata
輸出uce統計用戶的每次登錄和每條評論並進行總結。 我想達到什麼是這樣的 - > 輸出:
User14133 Logins: 2 Comments: 0 or 1 (Did User wrote one comment?)*
* In Mapper or Reducer (?)
for every line in the log{
if (user wrote comment){
return 1;
ignore all other comments from same user in this log;
} else if (user didn't write anything) return 0;
}
什麼是您的輸出鍵和輸出值類型?如果您可以提供一組輸入值和您期望的輸出值類型,那麼也許我們可以幫助您更好。 – aa8y 2013-03-04 12:21:21
「計算每次登錄」的含義是什麼?也正如上面所問,如果你可以只提供樣本輸入和相應的樣本輸出,這將是巨大的... – Amar 2013-03-04 15:12:11
我編輯了我的問題,我希望你明白我的意思:) – JustTheAverageGirl 2013-03-04 16:40:00