2017-08-23 76 views
-1

我試圖連接從筆記本電腦到紅移,到目前爲止,我已經做了以下 -加載外部罐到火花筆記本失敗

配置的元數據的筆記本

"customDeps": [ 
    "com.databricks:spark-redshift_2.10:3.0.0-preview1", 
    "com.databricks:spark-avro_2.11:3.2.0", 
    "com.databricks:spark-csv_2.11:1.5.0" 
] 

經過瀏覽器控制檯,以確保該庫重新啓動內核後加載

ui-logs-1422> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.m2/repository/com/databricks/spark-avro_2.10/3.0.0/spark-avro_2.10-3.0.0.jar 
kernel.js:978 ui-logs-1452> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/databricks/spark-redshift_2.10/3.0.0-preview1/spark-redshift_2.10-3.0.0-preview1.jar 
kernel.js:978 ui-logs-1509> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.5.0/spark-csv_2.11-1.5.0.jar 
kernel.js:978 ui-logs-1526> [Tue Aug 22 2017 09:46:26 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/databricks/spark-avro_2.11/3.2.0/spark-avro_2.11-3.2.0.jar 
When i try to load a table - i run into class not found exception, 
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html 
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594) 
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86) 
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86) 
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) 
    ... 63 elided 
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource 
    at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579) 
    at scala.util.Try$.apply(Try.scala:192) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579) 
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579) 
    at scala.util.Try.orElse(Try.scala:84) 
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579) 

其他人遇到此問題或已解決此問題?

我注意到與另一個依賴關係類似的問題,以及配置中是否缺少任何東西?

試行時間序列樣品中的筆記本 - 筆記本/時間序列/火花Timeseries.snb.ipynb 通知在元數據中的現有條目進行自定義的依賴 -

"customDeps": [ 
    "com.cloudera.sparkts % sparkts % 0.3.0" 
    ] 

快速驗證該包的可用性@https://spark-packages.org/package/sryza/spark-timeseries (更新元數據,包括此行)

"com.cloudera.sparkts:sparkts:0.4.1" 

重啓內核後 - 驗證庫加載

ui-logs-337> [Wed Aug 23 2017 09:29:25 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Will fetch these customDeps artifacts:Set(Dependency(com.cloudera.sparkts:sparkts,0.3.0,,Set(),Attributes(,),false,true), Dependency(com.cloudera.sparkts:sparkts,0.4.1,,Set(),Attributes(,),false,true)) 
kernel.js:978 ui-logs-347> [Wed Aug 23 2017 09:29:37 GMT+0530 (IST)] [notebook.util.CoursierDeps$] Fetched artifact to:/Users/xxxx/.coursier/cache/v1/http/repo1.maven.org/maven2/com/cloudera/sparkts/sparkts/0.4.1/sparkts-0.4.1.jar 
Error message - 

<console>:69: error: object cloudera is not a member of package com 
     import com.cloudera.sparkts._ 
       ^
<console>:70: error: object cloudera is not a member of package com 
     import com.cloudera.sparkts.stats.TimeSeriesStatisticalTests 

回答

0

下游火花筆記本的另一個版本(這不是從主分支)。

spark-notebook-0.7.0-scala-2.11.8-spark-2.1.1-hadoop-2.7.2 
against 
spark-notebook-0.9.0-SNAPSHOT-scala-2.11.8-spark-2.1.1-hadoop-2.7.2 

此外,我必須確保斯卡拉,引發&的Hadoop版本是跨依賴我已經配置完好。 在這個特殊的例子中,我不得不從命令行中爲Amazon Amazon Redshift驅動程序設置jar文件,因爲這在maven倉庫中不可用。

export EXTRA_CLASSPATH=RedshiftJDBC4-1.2.7.1003.jar 

希望這有助於他人