0
我試圖用PySpark 2.1.0捻與PySpark2:錯誤KuduStorageHandler
>>> from os.path import expanduser, join, abspath
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql import Row
>>> spark = SparkSession.builder \
.master("local") \
.appName("HivePyspark") \
.config("hive.metastore.warehouse.dir", "hdfs:///user/hive/warehouse") \
.enableHiveSupport() \
.getOrCreate()
>>> spark.sql("select count(*) from mySchema.myTable").show()
讀取存儲在捻數據我有捻1.2.0集羣上安裝。這些是配置單元/ Impala表。
當我執行最後一行,我得到以下錯誤:
.
.
.
: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.com.cloudera.kudu.hive.KuduStorageHandler
.
.
.
aused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.com.cloudera.kudu.hive.KuduStorageHandler
at org.apache.hadoop.hive.ql.metadata.HiveUtils.getStorageHandler(HiveUtils.java:315)
at org.apache.hadoop.hive.ql.metadata.Table.getStorageHandler(Table.java:284)
... 61 more
Caused by: java.lang.ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler
我指的是以下資源:
https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
https://github.com/bkvarda/iot_demo/blob/master/total_data_count.py
https://kudu.apache.org/docs/developing.html#_kudu_python_client
我想知道我怎麼能包括捻相關的依賴到我pyspark程序,這樣我可以搬過去這個錯誤。
我有同樣的問題,我不能讓它開始工作。你能分享你的代碼嗎? 我已經通過kudu-spark2 jar到pyspark2,sparkcontext被正確創建爲'spark'變量。但是,當我嘗試'spark.sql(...)。show()'我得到'加載存儲處理程序時出錯handler.com.cloudera.kudu.hive.KuduStorageHandler' – Susensio
代碼與上面相同。唯一的區別是我根據我的配置提供了maven包:https://mvnrepository.com/artifact/org.apache.kudu/kudu-spark2_2.11 作爲幫手代碼:https://github.com/ asarraf/KuduPyspark /斑點/主/ kuduspark2.template.py –