2017-04-27 97 views
1

我不認爲我的標題可以解釋這樣這裏的問題是這個問題:如何更新/刪除Spark-hive中的數據?

詳細build.sbt:

name := "Hello" 
scalaVersion := "2.11.8" 
version  := "1.0" 

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0" 
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0" 
libraryDependencies += "org.apache.spark" % "spark-hive_2.11" % "2.1.0" 

代碼:

val sparkSession = SparkSession.builder().enableHiveSupport().appName("HiveOnSpark").master("local").getOrCreate() 
val hiveql : HiveContext = new HiveContext(sparkSession.sparkContext); 

hiveql.sql("drop table if exists test") 
hiveql.sql("create table test (id int, name string) stored as orc tblproperties(\"transactional\"=\"true\")") 
hiveql.sql("insert into test values(1,'Yash')") 
hiveql.sql("insert into test values(2,'Yash')") 
hiveql.sql("insert into test values(3,'Yash')") 
hiveql.sql("select * from test").show() 
hiveql.sql("delete from test where id= 1") 

問題:

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
Operation not allowed: delete from(line 1, pos 0) 

== SQL == 
delete from test where id= 1 
^^^ 

at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) 
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:925) 
at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitFailNativeCommand$1.apply(SparkSqlParser.scala:916) 
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93) 
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:916) 
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:52) 
at org.apache.spark.sql.catalyst.parser.SqlBaseParser$FailNativeCommandContext.accept(SqlBaseParser.java:952) 
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) 
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66) 
at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66) 
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93) 
at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:65) 
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54) 
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53) 
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:82) 
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45) 
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) 
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) 
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) 
at main.scala.InitMain$.delayedEndpoint$main$scala$InitMain$1(InitMain.scala:41) 
at main.scala.InitMain$delayedInit$body.apply(InitMain.scala:9) 
at scala.Function0$class.apply$mcV$sp(Function0.scala:34) 
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) 
at scala.App$$anonfun$main$1.apply(App.scala:76) 
at scala.App$$anonfun$main$1.apply(App.scala:76) 
at scala.collection.immutable.List.foreach(List.scala:381) 
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) 
at scala.App$class.main(App.scala:76) 
at main.scala.InitMain$.main(InitMain.scala:9) 
at main.scala.InitMain.main(InitMain.scala) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) 

與更新查詢相同問題。

所以,現在我已經通過ThisThisupdate query in Spark SQLThisThis和其他許多人走了。

我已經知道Spark不支持更新/刪除,但我處於需要使用這兩種操作的情況。任何人都可以建議/幫助莫名其妙。

+0

您是否嘗試過通過jdbc連接更新/刪除? – dirceusemighini

回答

0

一個不是很高性能的變通辦法將

  1. 裝入現有的數據(我會建議使用數據框API)
  2. 創建更新/刪除記錄
  3. 改寫新的數據幀DataFrame到磁盤

爲Hive-Table選擇合適的分區可以最大限度地減少要重寫的數據量。

+0

是的這可以解決小型倉庫的問題,但是如果我有一個300000記錄的倉庫,需要刪除20個倉庫並更新10個倉庫。我不認爲在這種情況下,任何人都會更喜歡重寫整個數據框。 –

+0

@yashpalbharadwaj它比沒有好.... –

+0

是的,我知道,我正在使用它,但必須有另一個門。我正在尋找那種方式。 –