0
嘗試從Pyspark讀取和寫入存儲在遠程Hive Server中的數據。我按照這個例子:Pyspark:選擇遠程Hive服務器中的數據
from os.path import expanduser, join, abspath
from pyspark.sql import SparkSession
from pyspark.sql import Row
# warehouse_location points to the default location for managed databases and tables
warehouse_location = 'hdfs://quickstart.cloudera:8020/user/hive/warehouse'
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
示例演示如何創建在倉庫中的新表:
# spark is an existing SparkSession
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
# Queries are expressed in HiveQL
spark.sql("SELECT * FROM src").show()
然而,我需要訪問現有的這是在mytest.db
創造了多部未華iris
,所以表的位置是
table_path = warehouse_location + '/mytest.db/iris`
如何從現有表中選擇?
更新
我有metastore網址:
http://test.mysite.net:8888/metastore/table/mytest/iris
和表位置網址:
hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytest.db/iris
以上使用hdfs://quickstart.cloudera:8020/user/hive/warehouse
時在代碼中的一個倉庫位置,並試圖:
spark.sql("use mytest")
我得到異常:
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Database 'mytest' not found;"
怎麼做才能正確的URL從iris
選擇?
謝謝指定要使用的數據庫!請參閱我對該問題的更新。我無法弄清楚使用哪個網址,請幫忙。 – dokondr