我正在使用Apache Spark來分析查詢日誌。我已經遇到了設置火花的一些困難。現在我正在使用獨立羣集來處理查詢。Apache Spark Mysql連接沒有找到合適的jdbc驅動
首先我使用java中的示例代碼來計算正常工作的單詞。但是當我嘗試將它連接到MySQL服務器問題出現時。我使用的是Ubuntu 14.04 LTS,64位。 Spark版本1.4.1,Mysql 5.1。
這是我的代碼,當我使用Master Url而不是[Local *]時出現錯誤找不到合適的驅動程序。我已經包含了日誌。
import java.io.Serializable;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;
public class LoadFromDb implements Serializable {
private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(LoadFromDb.class);
private static final String MYSQL_DRIVER = "com.mysql.jdbc.Driver";
private static final String MYSQL_USERNAME = "spark";
private static final String MYSQL_PWD = "spark123";
private static final String MYSQL_CONNECTION_URL =
"jdbc:mysql://localhost/productsearch_userinfo?user=" + MYSQL_USERNAME + "&password=" + MYSQL_PWD;
private static final JavaSparkContext sc =
new JavaSparkContext(new SparkConf().setAppName("SparkJdbcDs").setMaster("spark://shawon-H67MA-USB3-B3:7077"));
private static final SQLContext sqlContext = new SQLContext(sc);
public static void main(String[] args) {
//Data source options
Map<String, String> options = new HashMap<>();
options.put("driver", MYSQL_DRIVER);
options.put("url", MYSQL_CONNECTION_URL);
options.put("dbtable",
"query");
//options.put("partitionColumn", "sessionID");
// options.put("lowerBound", "10001");
//options.put("upperBound", "499999");
//options.put("numPartitions", "10");
//Load MySQL query result as DataFrame
DataFrame jdbcDF = sqlContext.load("jdbc", options);
//jdbcDF.show();
jdbcDF.select("id","queryText").show();
}
}
任何示例項目都會提供很多幫助。日誌: 用放電的默認log4j的配置文件:組織/阿帕奇/火花/ log4j-defaults.properties
15/08/29 03:38:26 INFO SparkContext: Running Spark version 1.4.1
15/08/29 03:38:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/29 03:38:27 WARN Utils: Your hostname, shawon-H67MA-USB3-B3 resolves to a loopback address: 127.0.0.1; using 192.168.1.102 instead (on interface eth0)
15/08/29 03:38:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/08/29 03:38:27 INFO SecurityManager: Changing view acls to: shawon
15/08/29 03:38:27 INFO SecurityManager: Changing modify acls to: shawon
15/08/29 03:38:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shawon); users with modify permissions: Set(shawon)
15/08/29 03:38:27 INFO Slf4jLogger: Slf4jLogger started
15/08/29 03:38:27 INFO Remoting: Starting remoting
15/08/29 03:38:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60742]
15/08/29 03:38:27 INFO Utils: Successfully started service 'sparkDriver' on port 60742.
15/08/29 03:38:27 INFO SparkEnv: Registering MapOutputTracker
15/08/29 03:38:27 INFO SparkEnv: Registering BlockManagerMaster
15/08/29 03:38:27 INFO DiskBlockManager: Created local directory at /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/blockmgr-57acbba4-d7d4-4557-9e6c-e1acf97d4c88
15/08/29 03:38:27 INFO MemoryStore: MemoryStore started with capacity 473.3 MB
15/08/29 03:38:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/httpd-a5e6844d-ac3a-4da2-822c-1b98d0a287c4
15/08/29 03:38:27 INFO HttpServer: Starting HTTP Server
15/08/29 03:38:27 INFO Utils: Successfully started service 'HTTP file server' on port 55199.
15/08/29 03:38:27 INFO SparkEnv: Registering OutputCommitCoordinator
15/08/29 03:38:28 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/08/29 03:38:28 INFO SparkUI: Started SparkUI at http://192.168.1.102:4040
15/08/29 03:38:28 INFO AppClient$ClientActor: Connecting to master akka.tcp://[email protected]:7077/user/Master...
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150829033828-0000
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor added: app-20150829033828-0000/0 on worker-20150829033238-192.168.1.102-36976 (192.168.1.102:36976) with 4 cores
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150829033828-0000/0 on hostPort 192.168.1.102:36976 with 4 cores, 512.0 MB RAM
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor updated: app-20150829033828-0000/0 is now RUNNING
15/08/29 03:38:28 INFO AppClient$ClientActor: Executor updated: app-20150829033828-0000/0 is now LOADING
15/08/29 03:38:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58874.
15/08/29 03:38:28 INFO NettyBlockTransferService: Server created on 58874
15/08/29 03:38:28 INFO BlockManagerMaster: Trying to register BlockManager
15/08/29 03:38:28 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.102:58874 with 473.3 MB RAM, BlockManagerId(driver, 192.168.1.102, 58874)
15/08/29 03:38:28 INFO BlockManagerMaster: Registered BlockManager
15/08/29 03:38:28 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/08/29 03:38:30 INFO SparkContext: Starting job: show at LoadFromDb.java:43
15/08/29 03:38:30 INFO DAGScheduler: Got job 0 (show at LoadFromDb.java:43) with 1 output partitions (allowLocal=false)
15/08/29 03:38:30 INFO DAGScheduler: Final stage: ResultStage 0(show at LoadFromDb.java:43)
15/08/29 03:38:30 INFO DAGScheduler: Parents of final stage: List()
15/08/29 03:38:30 INFO DAGScheduler: Missing parents: List()
15/08/29 03:38:30 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at LoadFromDb.java:43), which has no missing parents
15/08/29 03:38:30 INFO MemoryStore: ensureFreeSpace(4304) called with curMem=0, maxMem=496301506
15/08/29 03:38:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.2 KB, free 473.3 MB)
15/08/29 03:38:30 INFO MemoryStore: ensureFreeSpace(2274) called with curMem=4304, maxMem=496301506
15/08/29 03:38:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.2 KB, free 473.3 MB)
15/08/29 03:38:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.102:58874 (size: 2.2 KB, free: 473.3 MB)
15/08/29 03:38:30 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/08/29 03:38:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at LoadFromDb.java:43)
15/08/29 03:38:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/08/29 03:38:30 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:56580/user/Executor#1344522225]) with ID 0
15/08/29 03:38:30 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:30 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.102:56904 with 265.4 MB RAM, BlockManagerId(0, 192.168.1.102, 56904)
15/08/29 03:38:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.102:56904 (size: 2.2 KB, free: 265.4 MB)
15/08/29 03:38:31 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.102): java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:185)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:177)
at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:359)
at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 1]
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 2]
15/08/29 03:38:31 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, 192.168.1.102, PROCESS_LOCAL, 1171 bytes)
15/08/29 03:38:31 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on executor 192.168.1.102: java.sql.SQLException (No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123) [duplicate 3]
15/08/29 03:38:31 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
15/08/29 03:38:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/29 03:38:31 INFO TaskSchedulerImpl: Cancelling stage 0
15/08/29 03:38:31 INFO DAGScheduler: ResultStage 0 (show at LoadFromDb.java:43) failed in 1.680 s
15/08/29 03:38:31 INFO DAGScheduler: Job 0 failed: show at LoadFromDb.java:43, took 1.840969 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.1.102): java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/productsearch_userinfo?user=spark&password=spark123
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:185)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:177)
at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:359)
at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/08/29 03:38:31 INFO SparkContext: Invoking stop() from shutdown hook
15/08/29 03:38:31 INFO SparkUI: Stopped Spark web UI at http://192.168.1.102:4040
15/08/29 03:38:31 INFO DAGScheduler: Stopping DAGScheduler
15/08/29 03:38:31 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/08/29 03:38:31 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
15/08/29 03:38:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/08/29 03:38:31 INFO Utils: path = /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1/blockmgr-57acbba4-d7d4-4557-9e6c-e1acf97d4c88, already present as root for deletion.
15/08/29 03:38:31 INFO MemoryStore: MemoryStore cleared
15/08/29 03:38:31 INFO BlockManager: BlockManager stopped
15/08/29 03:38:32 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/29 03:38:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/08/29 03:38:32 INFO SparkContext: Successfully stopped SparkContext
15/08/29 03:38:32 INFO Utils: Shutdown hook called
15/08/29 03:38:32 INFO Utils: Deleting directory /tmp/spark-85b7b4c4-ed50-4ccf-97fc-25b14ab404b1
15/08/29 03:38:32 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
他們似乎缺少MySQL-jdbc-驅動程序。您是否嘗試過使用Java安裝連接到任何MySQL數據庫?可能你需要一個適用於你的系統的MySQL-jdbc驅動程序。 – flaschenpost
我有驅動程序作爲jar文件,其名稱爲mysql-connector-java-5.1.36-bin.jar。我可以通過這個jar文件使用java連接,但是當我想使用它使用火花問題出現...... :( – Shawon91
你如何提交你的應用程序? – eliasah