下面是使用AWS S3 putObject直接與字符串緩衝區一起使用Java的另一種方法。
... AmazonS3 s3Client;
public void reduce(Text key, java.lang.Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws Exception {
UUID fileUUID = UUID.randomUUID();
SimpleDateFormat sdf = new SimpleDateFormat("yyy-MM-dd");
sdf.setTimeZone(TimeZone.getTimeZone("UTC"));
String fileName = String.format("nightly-dump/%s/%s-%s",sdf.format(new Date()), key, fileUUID);
log.info("Filename = [{}]", fileName);
String content = "";
int count = 0;
for (Text value : values) {
count++;
String s3Line = value.toString();
content += s3Line + "\n";
}
log.info("Count = {}, S3Lines = \n{}", count, content);
PutObjectResult putObjectResult = s3Client.putObject(S3_BUCKETNAME, fileName, content);
log.info("Put versionId = {}", putObjectResult.getVersionId());
reduceWriteContext("1", "1");
context.setStatus("COMPLETED");
}
順便說一句,爲什麼不能用[DistributedCache(http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html)? 它與您正在做的方法一樣便攜,但對於長時間運行的作業可能更有用 – aldrinleal