我試圖運行一個獨立的Spark集羣星火SQL超時
select a.name, b.name, s.score
from score s
inner join A a on a.id = s.a_id
inner join B b on b.id = s.b_id
where pmod(a.id, 3) != 3 and pmod(b.id, 3) != 0
表尺寸如下一個相對簡單的星火SQL命令
A: 25,000
B: 2,500,000
score: 25,000,000
因此,從這個我期望得到2500萬行的結果。我想用Spark SQL運行這個查詢,然後處理每一行。下面是相關火花代碼
val sqlContext = new HiveContext(sc)
val sql = "<above SQL>"
sqlContext.sql(sql).first
此命令運行正常,當表分數的大小爲20萬,但現在不運行。以下是相關的日誌
14/12/04 16:35:14 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:35:43 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:36:24 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:37:11 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:38:13 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:39:19 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:39:48 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
14/12/04 16:40:08 WARN MemoryStore: Not enough space to store block broadcast_12 in memory! Free memory is 1938057068 bytes.
14/12/04 16:40:08 WARN MemoryStore: Persisting block broadcast_12 to disk instead.
java.util.concurrent.TimeoutException: Futures timed out after [5 minutes]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.sql.execution.BroadcastHashJoin.execute(joins.scala:431)
at org.apache.spark.sql.execution.Project.execute(basicOperators.scala:42)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:111)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
at org.apache.spark.rdd.RDD.first(RDD.scala:1092)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC.<init>(<console>:27)
at $iwC.<init>(<console>:29)
at <init>(<console>:31)
at .<init>(<console>:35)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我最初的想法是增加此超時,但這並不無需重新編譯源作爲表演here看起來可能。在父目錄中,我也看到了一些不同的連接,但我不確定如何讓spark使用其他類型的連接。
我也試圖通過增加spark.executor.memory到10g來解決我的第一個關於堅持磁盤的警告,但那並沒有解決問題。
有誰知道我可以如何運行此查詢?
看到類似的問題 - 你有沒有找到這個解決方案? – NightWolf 2014-12-23 04:00:23