2017-05-31 38 views
1

我在簇模式下的簇集模式中有一個spark sql 2.1.1作業,我想創建一個空的外部配置單元表(具有位置的分區將在後面的步驟中添加)。如何創建沒有位置的外部Hive表?

CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 

當我運行的工作,我得到的錯誤:

CREATE EXTERNAL TABLE must be accompanied by LOCATION

但是,當我在色調上蜂巢編輯器中運行相同的查詢,它運行得很好。我試圖在SparkSQL 2.1.1文檔中找到一個答案,但是沒有成功。

有誰知道爲什麼Spark SQL對查詢更嚴格?

+0

只需使用表或任意位置的根文件夾爲此事 –

+0

附:沒有位置的桌子就沒有這樣的東西。如果您尚未定義位置,則使用默認路徑。 –

回答

1

TL; DREXTERNAL沒有LOCATIONis not allowed

確切答案在Spark SQL的語法定義文件SqlBase.g4中。

你可以找到CREATE EXTERNAL TABLE定義爲createTableHeader

CREATE TEMPORARY? EXTERNAL? TABLE (IF NOT EXISTS)? tableIdentifier 

這個定義是在支持的SQL statements使用。

除非我誤認爲locationSpec是可選的。這是根據ANTLR語法。代碼可能會做出其他決定,看起來確實如此。

scala> spark.version 
res4: String = 2.3.0-SNAPSHOT 

val q = "CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'" 
scala> sql(q) 
org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0) 

== SQL == 
CREATE EXTERNAL TABLE IF NOT EXISTS new_table (id BIGINT, StartTime TIMESTAMP, EndTime TIMESTAMP) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' 
^^^ 

    at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) 
    at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1096) 
    at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitCreateHiveTable$1.apply(SparkSqlParser.scala:1064) 
    at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) 
    at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:1064) 
    at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitCreateHiveTable(SparkSqlParser.scala:55) 
    at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateHiveTableContext.accept(SqlBaseParser.java:1124) 
    at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) 
    at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) 
    at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) 
    at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) 
    at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) 
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) 
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) 
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) 
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) 
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) 
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) 
    ... 48 elided 

默認SparkSqlParser(與astBuilderSparkSqlAstBuilder)具有導致異常的following assertion

if (external && location.isEmpty) { 
    operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx) 

我會考慮在Spark's JIRA報告的一個問題,如果你認爲情況應該被允許。見SPARK-2825有一個有力的論據有支持:

CREATE EXTERNAL TABLE already works as far as I know and should have the same semantics as Hive.