0

我想從AWS S3保存並讀取Spark DataFrame。我GOOGLE了很多,但沒有發現太多的用處。Spark S3 I/O - [S3ServiceException] S3 HEAD請求失敗

我寫的代碼是這樣的:

val spark = SparkSession.builder().master("local").appName("test").getOrCreate() 

spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "**********") 
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "********************") 

import spark.implicits._ 

spark.read.textFile("s3n://myBucket/testFile").show(false) 

List(1,2,3,4).toDF.write.parquet("s3n://myBucket/test/abc.parquet") 

但是,當運行它,我得到以下錯誤:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/myBucket/testFile' - ResponseCode=403, ResponseMessage=Forbidden 
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245) 
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119) 
[info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
[info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
[info] at java.lang.reflect.Method.invoke(Method.java:498) 
[info] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) 
[info] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 
[info] at org.apache.hadoop.fs.s3native.$Proxy15.retrieveMetadata(Unknown Source) 
[info] at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414) 
[info] ... 
[info] Cause: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/myBucket/testFile' - ResponseCode=403, ResponseMessage=Forbidden 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:477) 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:718) 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1599) 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1535) 
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1987) 
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1332) 
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111) 
[info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
[info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
[info] ... 
[info] Cause: org.jets3t.service.impl.rest.HttpException: 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:475) 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:718) 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1599) 
[info] at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1535) 
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1987) 
[info] at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1332) 
[info] at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111) 
[info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
[info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
[info] ... 

我使用

  • 星火:2.1。 0
  • 斯卡拉:2.11.2
  • AWS Java SDK:1.11.126

任何幫助表示讚賞!

回答

1
I have tried following things on spark version 2.1.1 and its worked fine for me. 

Step 1: Download following jars: 
    -- hadoop-aws-2.7.3.jar 
    -- aws-java-sdk-1.7.4.jar 
    Note: 
     If you not able to find the following jars, then you can get the jars from hadoop-2.7.3 
Step 2: Place the above jars into $SPARK_HOME/jars/ 

Step 3: code: 
import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._ 
val conf = new SparkConf().setMaster("local").setAppName("My App") 
val sc = new SparkContext(conf) 
sc.getOrCreate.hadoopConfiguration.set("fs.s3a.access.key"", "***********") 
sc.getOrCreate.hadoopConfiguration.set("fs.s3a.secret.key", "******************") 
val input = sc.textFile("s3a://mybucket/*.txt") 
List(1,2,3,4).toDF.write.parquet("s3a://mybucket/abc.parquet") 
+0

在響應中,版本各地傳播它'aws-java-sdk'是1.7.4。我可以使用'aws-java-sdk' v1.11.126,因爲'aws-java-sdk'的新版本中有很多功能在舊版v1.7.4中不可用? – himanshuIIITian

+0

確定你可以使用,但如果舊的未來不存在。那麼它可能是一個問題 –

+0

你在說什麼舊的未來? – himanshuIIITian

0

設置在火花的conf本身的祕密包含「spark.hadoop.fs.s3n ...」選項,使火花與工作