2017-05-05 163 views
1

我想讀從亞馬遜紅移數據,但得到以下錯誤從紅移讀取數據:無法使用火花斯卡拉

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README. 
at scala.Predef$.require(Predef.scala:224) 
at com.databricks.spark.redshift.Parameters$MergedParameters.<init>(Parameters.scala:91) 
at com.databricks.spark.redshift.Parameters$.mergeParameters(Parameters.scala:83) 
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:50) 
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) 
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) 
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) 

使用下面的代碼來讀取數據:

val session = SparkSession.builder() 
    .master("local") 
    .appName("POC") 
    .getOrCreate() 

    session.conf.set("fs.s3n.awsAccessKeyId", "<access_key>") 
    session.conf.set("fs.s3n.awsSecretAccessKey", "<secret-key>") 

    val eventsDF = session.read 
    .format("com.databricks.spark.redshift") 
    .option("url","<jdbc_url>") 
    .option("dbtable", "test.account") 
    .option("tempdir", "s3n://testBucket/data") 
    .load() 
    eventsDF.show() 

build.sbt:

name:= "Redshift_read" 

scalaVersion:= "2.11.8" 

version := "1.0" 

val sparkVersion = "2.1.0" 

    libraryDependencies ++= Seq(
     "org.apache.spark" %% "spark-core" % sparkVersion, 
     "org.apache.spark" %% "spark-sql" % sparkVersion, 
     "com.databricks" %% "spark-redshift" % "3.0.0-preview1", 
     "com.amazonaws"  % "aws-java-sdk" % "1.11.0" 
    ) 

誰能幫助我,我在想什麼?我已經在火花中提供了訪問密鑰和密鑰,但仍然出現錯誤。

+1

是不是'fs.s3n.awsAccessKeyId'等。 SparkContext配置的一部分而不是SparkSession的一部分? – Josef

+0

你說得對。當我設置sparkcontext配置,它的工作。謝謝格蘭特 – Rishi

回答

1

我得到它的工作只是通過定義SparkContext設置,而不是SparkSession的S3鍵。

替換:

session.conf.set("fs.s3n.awsAccessKeyId", "<access_key>") 
session.conf.set("fs.s3n.awsSecretAccessKey", "<secret-key>") 

有:

session.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId,"<access_key>") 
session.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "<secret_key>") 

而且在build.sbt添加以下的依賴:

resolvers += "redshift" at "http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release" 

    "com.amazon.redshift" % "redshift-jdbc42" % "1.2.1.1001"