1
我實現了spark應用程序。 我創建了火花背景:Spark不會從s3讀取/寫入信息(ResponseCode = 400,ResponseMessage = Bad Request)
private JavaSparkContext createJavaSparkContext() {
SparkConf conf = new SparkConf();
conf.setAppName("test");
if (conf.get("spark.master", null) == null) {
conf.setMaster("local[4]");
}
conf.set("fs.s3a.awsAccessKeyId", getCredentialConfig().getS3Key());
conf.set("fs.s3a.awsSecretAccessKey", getCredentialConfig().getS3Secret());
conf.set("fs.s3a.endpoint", getCredentialConfig().getS3Endpoint());
return new JavaSparkContext(conf);
}
我嘗試通過火花集API(火花SQL)從S3獲取數據:
String s = "s3a://" + getCredentialConfig().getS3Bucket();
Dataset<Row> csv = getSparkSession()
.read()
.option("header", "true")
.csv(s + "/dataset.csv");
System.out.println("Read size :" + csv.count());
有一個錯誤:
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 1A3E8CBD4959289D, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: Q1Fv8sNvcSOWGbhJSu2d3Nfgow00388IpXiiHNKHz8vI/zysC8V8/YyQ1ILVsM2gWQIyTy1miJc=
Hadoop版本:2.7
AWS端點:s3.eu-central-1.amazonaws.com
(Hadoop的2.8 - 一切工作正常)