我正在嘗試完成「將數據框寫入Hive表」的簡單操作,下面是用Java編寫的代碼。我使用Cloudera VM時沒有任何更改。在Apache Spark中將數據框寫入Java中的Hive表格
public static void main(String[] args) {
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(JsonToHive.class.getName())
//.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/")
.enableHiveSupport().master(master).getOrCreate();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json");
rowDataset.printSchema();
rowDataset.registerTempTable("employeesData");
Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData");
firstRow.show();
sparkSession.catalog().listTables().select("*").show();
firstRow.write().mode() saveAsTable("default.employee");
sparkSession.close();
}
我已創建使用HQL在HIVE的管理表,
CREATE TABLE employee (firstName STRING, lastName STRING, addresses ARRAY < STRUCT < street:STRING, city:STRING, state:STRING > >) STORED AS PARQUET;
我是從 「employees.json」
{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}
它讀一個簡單的JSON文件數據說「表default
.employee
已經存在;」並且不會附加內容。如何將內容追加到配置單元表?
如果我設定的模式(「追加」),它不抱怨,但它不寫的內容以及..
firstRow.write()模式(「追加」)saveAsTable(「默認。僱員」);
任何幫助將不勝感激......謝謝。
+-------------+--------+-----------+---------+-----------+
| name|database|description|tableType|isTemporary|
+-------------+--------+-----------+---------+-----------+
| employee| default| null| MANAGED| false|
|employeesdata| null| null|TEMPORARY| true|
+-------------+--------+-----------+---------+-----------+
UPDATE
/usr/lib/hive/conf/hive-site.xml是不是在類路徑,以便它不讀表,它工作得很好classpath中加入它以後。 。由於我從IntelliJ運行,我有這個問題..在生產spark-conf文件夾將鏈接到蜂巢site.xml ...
你需要創建HiveContext HiveContext sqlContext =新org.apache.spark.sql.hive.HiveContext(ctx.sc()); –
我認爲根本問題是我無法連接到本地配置單元,下面的調用返回 「線程中的異常」main「org.apache.spark.sql.catalyst.analysis.NoSuchTableException:Table或view'employee '未在數據庫中找到'默認';「 hiveContext.sql(「SHOW COLUMNS FROM default.employee」)。show(); sqlCtx.sql(「SHOW COLUMNS FROM default.employee」)。show(); – Manjesh
在HiveContext上設置配置..沒有好運........... hiveContext.setConf(「hive.metastore.warehouse.dir」,「hdfs:// localhost:50070/user/hive/warehouse 「); hiveContext.sql(「SHOW COLUMNS FROM employee」)。show(); – Manjesh