Spark每次執行轉換/操作時都會連接到數據庫？

我在Windows 10上運行Spark 2.1.0。我連接到MySQL數據庫以使用JDBC將數據導入spark。如下所示，每當我執行一個操作時，我會得到以下警告，這讓我懷疑數據是從數據庫中爲每個操作檢索的。Spark每次執行轉換/操作時都會連接到數據庫？

scala> val jdbcDF2 = spark.read.jdbc("jdbc:mysql:dbserver", "schema.tablename", connectionProperties) 
Wed Mar 29 15:05:23 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 
jdbcDF2: org.apache.spark.sql.DataFrame = [id: bigint, site: bigint ... 15 more fields] 

scala> jdbcDF2.count 
Wed Mar 29 15:09:09 IST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.

如果是這樣的話，有沒有辦法讓我可以將數據保存到火花本地對象就像一個數據幀，這樣它沒有連接到數據庫，所有的時間？

我試圖cache the table併成功運行，但我無法使用星火-SQL放在桌子上

scala> jdbcDF2.cache() 
res6: jdbcDF2.type = [id: bigint, site: bigint ... 15 more fields] 
scala> val unique = sql("SELECT DISTINCT site FROM jdbcDF2") 
org.apache.spark.sql.AnalysisException: Table or view not found: jdbcDF2;

來源

2017-03-29 Nagesh

你是對緩存的數據幀以備後用，爲了在不查詢數據庫每個星火行動（收集，統計，第一，...）

但使用SQL查詢您的數據幀，首先你要做的：

jdbcDF2.createOrReplaceTempView("my_table")

和N：

sql("SELECT DISTINCT site FROM my_table")

來源

2017-03-29 11:02:27

您可以直接使用

val unique = jdbcDF2.selectExpr("count(distinct site)")

或

val unique = jdbcDF2.select("site").distinct.count

或創建從您的數據幀的臨時觀點和你的數據幀緩存後執行您查詢通過sqlContext訪問它

jdbcDF2.createOrReplaceTempView("jdbcDF2") 
val unique = sql("SELECT DISTINCT site FROM jdbcDF2")

來源

2017-03-29 11:19:33 FaigB

Spark每次執行轉換/操作時都會連接到數據庫？

回答

相關問題