我在Windows 10上運行Spark 2.1.0。我連接到MySQL數據庫以使用JDBC將數據導入spark。如下所示,每當我執行一個操作時,我會得到以下警告,這讓我懷疑數據是從數據庫中爲每個操作檢索的。Spark每次執行轉換/操作時都會連接到數據庫?
scala> val jdbcDF2 = spark.read.jdbc("jdbc:mysql:dbserver", "schema.tablename", connectionProperties)
Wed Mar 29 15:05:23 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
jdbcDF2: org.apache.spark.sql.DataFrame = [id: bigint, site: bigint ... 15 more fields]
scala> jdbcDF2.count
Wed Mar 29 15:09:09 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
如果是這樣的話,有沒有辦法讓我可以將數據保存到火花本地對象就像一個數據幀,這樣它沒有連接到數據庫,所有的時間?
我試圖cache the table併成功運行,但我無法使用星火-SQL放在桌子上
scala> jdbcDF2.cache()
res6: jdbcDF2.type = [id: bigint, site: bigint ... 15 more fields]
scala> val unique = sql("SELECT DISTINCT site FROM jdbcDF2")
org.apache.spark.sql.AnalysisException: Table or view not found: jdbcDF2;