2017-08-01 29 views
0

我有一個使用的Postgres JDBC驅動程序作爲這裏所描述的火花(2.1.0)的工作,包括在火花JDBC JAR:https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases如何使用maven

我使用的數據幀的作家像

val jdbcURL = s"jdbc:postgresql://${config.pgHost}:${config.pgPort}/${config.pgDatabase}?user=${config.pgUser}&password=${config.pgPassword}" 
val connectionProperties = new Properties() 
connectionProperties.put("user", config.pgUser) 
connectionProperties.put("password", config.pgPassword) 
dataFrame.write.mode(SaveMode.Overwrite).jdbc(jdbcURL, tableName, connectionProperties) 

我成功地包括https://jdbc.postgresql.org/download/postgresql-42.1.1.jar手動下載並使用--jars postgresql-42.1.1.jar --driver-class-path postgresql-42.1.1.jar

但是JDBC驅動程序,我寧願不用先下載它。

我已經嘗試過失敗--jars https://jdbc.postgresql.org/download/postgresql-42.1.1.jar,但未能從

Exception in thread "main" java.io.IOException: No FileSystem for scheme: http 
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) 
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) 
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) 
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) 
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 
    at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:364) 
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:480) 
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$8.apply(Client.scala:600) 
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$8.apply(Client.scala:599) 
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74) 
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599) 
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598) 
    at scala.collection.immutable.List.foreach(List.scala:381) 
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598) 
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:868) 
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170) 
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1154) 
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213) 
    at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

我也曾嘗試:

包括"org.postgresql" % "postgresql" % "42.1.1"在我build.sbt文件

​​選項:--repositories https://mvnrepository.com/artifact --packages org.postgresql:postgresql:42.1.1

​​選項:--repositories https://mvnrepository.com/artifact --conf "spark.jars.packages=org.postgresql:postgresql:42.1.1

這些每個故障的相同方式:

17/08/01 13:14:49 ERROR yarn.ApplicationMaster: User class threw exception: java.sql.SQLException: No suitable driver 
java.sql.SQLException: No suitable driver 
    at java.sql.DriverManager.getDriver(DriverManager.java:315) 
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:84) 
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:84) 
    at scala.Option.getOrElse(Option.scala:121) 
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:83) 
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:34) 
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53) 
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) 
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) 
    at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446) 

回答

0

像你的JDBC類做用戶名和密碼指定驅動程序選項。

4

您可以將JDBC jar文件複製到spark目錄中的jars文件夾中,並將​​的應用程序部署到不帶--jars選項。

+0

很酷,這個工程。我仍然希望自己包含在項目中(這樣我就可以分發我的jar和run-command而無需安裝),但這比我所做的要好。 – ben