我正在將我的spark應用程序連接到DashDB。目前,我可以加載我的數據就好了。將JDBC驅動到帶有CLOB錯誤的DashDB(DB2)
但是,我無法將DataFrame保存到DashDB。
任何見解都會有所幫助。
var jdbcSets = sqlContext.read.format("jdbc").options(Map("url" -> url, "driver" -> driver, "dbtable" -> "setsrankval")).load()
jdbcSets.registerTempTable("setsOpponentRanked")
jdbcSets = jdbcSets.coalesce(10)
sqlContext.cacheTable("setsOpponentRanked")
然而,當我試圖挽救大DataFrames,我得到的錯誤:
DB2 SQL錯誤:SQLCODE = -1666,SQLSTATE = 42613,則sqlerrmc = CLOB,DRIVER = 26年4月19日
的代碼我使用保存數據如下:
val writeproperties = new Properties()
writeproperties.setProperty("user", "dashXXXX")
writeproperties.setProperty("password", "XXXXXX")
writeproperties.setProperty("rowId", "false")
writeproperties.setProperty("driver", "com.ibm.db2.jcc.DB2Driver")
results.write.mode(SaveMode.Overwrite).jdbc(writeurl, "players_stat_temp", writeproperties)
示例測試數據集在這裏可以看到:
println("Test set: "+results.first())
Test set: ['Damir DZUMHUR','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test','test',null,null,null,null,null,null,null]
數據幀架構如下:
root
|-- PLAYER: string (nullable = true)
|-- set01: string (nullable = true)
|-- set02: string (nullable = true)
|-- set12: string (nullable = true)
|-- set01weakseed: string (nullable = true)
|-- set01medseed: string (nullable = true)
|-- set01strongseed: string (nullable = true)
|-- set02weakseed: string (nullable = true)
|-- set02medseed: string (nullable = true)
|-- set02strongseed: string (nullable = true)
|-- set12weakseed: string (nullable = true)
|-- set12medseed: string (nullable = true)
|-- set12strongseed: string (nullable = true)
|-- set01weakrank: string (nullable = true)
|-- set01medrank: string (nullable = true)
|-- set01strongrank: string (nullable = true)
|-- set02weakrank: string (nullable = true)
|-- set02medrank: string (nullable = true)
|-- set02strongrank: string (nullable = true)
|-- set12weakrank: string (nullable = true)
|-- set12medrank: string (nullable = true)
|-- set12strongrank: string (nullable = true)
|-- minibreak: string (nullable = true)
|-- minibreakweakseed: string (nullable = true)
|-- minibreakmedseed: string (nullable = true)
|-- minibreakstrongseed: string (nullable = true)
|-- minibreakweakrank: string (nullable = true)
|-- minibreakmedrank: string (nullable = true)
|-- minibreakstrongrank: string (nullable = true)
我已經看過了JDBC DB2Dialect和看到,StringType代碼被映射到CLOB。我想知道以下內容是否有幫助:
private object DB2CustomDialect extends JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:db2")
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case StringType => Option(JdbcType("VARCHAR(10000)", java.sql.Types.VARCHAR))
case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
case _ => None
}
}
我有完全相同的問題,但我使用PySpark。我怎樣才能解決這個問題? –
您可以在使用Pixidust的Scala橋接功能的PySpark筆記本中應用此修補程序。我寫了一篇關於整個問題和解決方案的博客文章,其中包含示例筆記本的鏈接:http://datascience.ibm.com/blog/working-with-dashdb-in-data-science-experience/ –
我已經看到了這個文章之前,我實際上使用IBM的spark-submit而不是Notebooks/DSX。你是否說我需要在本地修復我的腳本,然後將它提交到Spark Cluster?由於Spark集羣是託管服務,是否安裝了所有這些依賴項? –