1
我想讀從亞馬遜紅移數據,但得到以下錯誤從紅移讀取數據:無法使用火花斯卡拉
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.
at scala.Predef$.require(Predef.scala:224)
at com.databricks.spark.redshift.Parameters$MergedParameters.<init>(Parameters.scala:91)
at com.databricks.spark.redshift.Parameters$.mergeParameters(Parameters.scala:83)
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:50)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
使用下面的代碼來讀取數據:
val session = SparkSession.builder()
.master("local")
.appName("POC")
.getOrCreate()
session.conf.set("fs.s3n.awsAccessKeyId", "<access_key>")
session.conf.set("fs.s3n.awsSecretAccessKey", "<secret-key>")
val eventsDF = session.read
.format("com.databricks.spark.redshift")
.option("url","<jdbc_url>")
.option("dbtable", "test.account")
.option("tempdir", "s3n://testBucket/data")
.load()
eventsDF.show()
build.sbt:
name:= "Redshift_read"
scalaVersion:= "2.11.8"
version := "1.0"
val sparkVersion = "2.1.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"com.databricks" %% "spark-redshift" % "3.0.0-preview1",
"com.amazonaws" % "aws-java-sdk" % "1.11.0"
)
誰能幫助我,我在想什麼?我已經在火花中提供了訪問密鑰和密鑰,但仍然出現錯誤。
是不是'fs.s3n.awsAccessKeyId'等。 SparkContext配置的一部分而不是SparkSession的一部分? – Josef
你說得對。當我設置sparkcontext配置,它的工作。謝謝格蘭特 – Rishi