開始HiveThriftServer編程在Python

在火花外殼（斯卡拉），我們的進口， org.apache.spark.sql.hive.thriftserver._ 用於編程啓動蜂巢節儉服務器作爲 HiveThriftServer2特定的蜂巢環境。 startWithContext（hiveContext）爲該特定會話公開一個已註冊的臨時表。開始HiveThriftServer編程在Python

我們如何使用python做同樣的事情？ python中是否有用於導入HiveThriftServer的包/ api？任何其他想法/建議表示讚賞。

我們已經使用pyspark創建一個數據幀

感謝

拉維·納拉亞南

來源

2016-04-14 Ravi Narayanan

爲什麼你需要一個節儉的服務器，因爲它是一個臨時表？難道你不能創建自己的Hivecontext，它將連接到本地臨時創建的Metastore嗎？ – user1314742

順便說一句，爲什麼你需要從你的代碼中啓動它？ – user1314742

如果我們將節點服務器作爲守護進程啓動，那麼我們無法查看臨時表（會話與我們啓動HiveContext的會話不同，臨時表可用於特定會話） –

可以使用Java的py4j網關導入。下面的代碼適用於spark 2.0.2，並可以通過直線查詢在python腳本中註冊的臨時表。

from py4j.java_gateway import java_import 
java_import(sc._gateway.jvm,"") 

spark = SparkSession \ 
     .builder \ 
     .appName(app_name) \ 
     .master(master)\ 
     .enableHiveSupport()\ 
     .config('spark.sql.hive.thriftServer.singleSession', True)\ 
     .getOrCreate() 
sc=spark.sparkContext 
sc.setLogLevel('INFO') 

#Start the Thrift Server using the jvm and passing the same spark session corresponding to pyspark session in the jvm side. 
sc._gateway.jvm.org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark._jwrapped) 

spark.sql('CREATE TABLE myTable') 
data_file="path to csv file with data" 
dataframe = spark.read.option("header","true").csv(data_file).cache() 
dataframe.createOrReplaceTempView("myTempView")

然後去直線，以檢查它是否correclty開始：

in terminal> $SPARK_HOME/bin/beeline 
beeline> !connect jdbc:hive2://localhost:10000 
beeline> show tables;

它應該顯示在Python創建的表和臨時表/視圖，包括「myTable的」以上「myTempView」。這是需要有才能看到臨時視圖

（見ANS相同火花會話：Avoid starting HiveThriftServer2 with created context programmatically
注意：這是可能的訪問，即使節儉服務器從終端開始，連接到同一個metastore蜂巢表，但是臨時視圖不能被訪問，因爲它們在火花會話中並沒有寫入元存儲器）

來源

2016-12-29 22:03:15

開始HiveThriftServer編程在Python

回答

相關問題