2016-07-23 53 views
0

我有一個DataFrame,我創建了實木複合地板文件。是否可以使用SparkSQL進行更新?

val file = "/user/spark/pagecounts-20160713-150000.parquet" 

val df = sqlContext.read.parquet(file) 
df.registerTempTable("wikipedia") 

現在我想要做一個更新:

// just a dummy update statement  
val sqlDF = sqlContext.sql("update wikipedia set requests=0 where article='!K7_Records'") 

但我發現了一個錯誤:

java.lang.RuntimeException: [1.1] failure: ``with'' expected but identifier update found

update wikipediaEnTemp set requests=0 where article='!K7_Records' 
^ 
    at scala.sys.package$.error(package.scala:27) 
    at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36) 
    at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67) 
    at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211) 
    at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211) 
    at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114) 
    at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113) 
    at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:137) 
    at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) 
    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) 
    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:237) 
    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) 
    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) 
    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:249) 
    at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:197) 
    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) 
    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:249) 
    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:217) 
    at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) 
    at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:882) 
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) 
    at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:881) 
    at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) 
    at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34) 
    at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208) 
    at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208) 
    at org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:43) 
    at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:231) 
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) 
    ... 57 elided 
+0

您不能直接更新數據框嗎?你爲什麼需要SparkSQL? –

+0

是的,我可以直接更新DataFrame。我需要SparkSQL的原因是我想爲用戶提供他們擅長的界面,即SQL。用戶不太可能編寫任何java/scala/python代碼。因此,我想隱藏用戶的低級細節,如Spark DataFrames,RDD等。 –

回答

2

RDD和Dataframes是不可改變的,因爲底層的數據是不可變的。所以DML選項不包含在sparkSQL中。

1

Spark表是不可變的,所以直接更新是不可能的。但是,如果您可以更改模式和查詢,則可以使用僅附加操作執行等效更新。一般問題在數據倉庫社區中已知爲Type II Slowly Changing Dimension。有一個Spark package這個,我沒有合作過。

相關問題