2017-04-24 156 views
1

我在AWS EMR集羣上運行火花2.1.0(基於以下的 - https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/查詢蜂房返回空結果

我嘗試查詢它存在一個表,並在遠程HIVE具有內部數據。 Spark正確地干涉模式,但表格內容爲空。任何想法?

import os 
import findspark 
findspark.init('/usr/lib/spark/') 

# Spark related imports 
from pyspark.sql import SparkSession 
from pyspark import SparkContext 

sc = SparkContext.getOrCreate() 
spark = SparkSession.builder.config(conf=sc.getConf()).getOrCreate() 

remote_hive = "jdbc:hive2://myhost:10000/mydb" 
driver = "org.apache.hive.jdbc.HiveDriver" 
user="user" 
password = "password" 

df = spark.read.format("jdbc").\ 
    options(url=remote_hive, 
      driver=driver, 
      user=user, 
      password=password, 
      dbtable="mytable").load() 


df.printSchema() 
# returns the right schema 
df.count() 
0 

回答

0

你可以嘗試 -

spark\ 
    .read.format("jdbc")\ 
    .option("driver", driver) 
    .option("url", remote_url) 
    .option("dbtable", "mytable") 
    .option("user", "user")\ 
    .option("password", "password") 
    .load() 
+0

同樣的結果 - 空表(df.count()= 0) –