1
我在AWS EMR集羣上運行火花2.1.0(基於以下的 - https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/)查詢蜂房返回空結果
我嘗試查詢它存在一個表,並在遠程HIVE具有內部數據。 Spark正確地干涉模式,但表格內容爲空。任何想法?
import os
import findspark
findspark.init('/usr/lib/spark/')
# Spark related imports
from pyspark.sql import SparkSession
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
spark = SparkSession.builder.config(conf=sc.getConf()).getOrCreate()
remote_hive = "jdbc:hive2://myhost:10000/mydb"
driver = "org.apache.hive.jdbc.HiveDriver"
user="user"
password = "password"
df = spark.read.format("jdbc").\
options(url=remote_hive,
driver=driver,
user=user,
password=password,
dbtable="mytable").load()
df.printSchema()
# returns the right schema
df.count()
0
同樣的結果 - 空表(df.count()= 0) –