2016-12-13 82 views
1

我在提交作業時偶爾遇到以下錯誤。如果我刪除filedao,datadao和sqldao的rootdir,此錯誤消失。這意味着我必須重新啓動作業服務器並重新上傳我的jar。使用spark-jobserver提交點火作業時出錯

{ 
    "status": "ERROR", 
    "result": { 
    "message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/1995aeba-com.spmsoftware.distributed.job.TestJob#-1370794810]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".", 
    "errorClass": "akka.pattern.AskTimeoutException", 
    "stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"] 
    } 
} 

我的配置文件如下:

# Template for a Spark Job Server configuration file 
# When deployed these settings are loaded when job server starts 
# 
# Spark Cluster/Job Server configuration 
# Spark Cluster/Job Server configuration 
spark { 
    # spark.master will be passed to each job's JobContext 
    master = <spark_master> 

    # Default # of CPUs for jobs to use for Spark standalone cluster 
    job-number-cpus = 4 

    jobserver { 
    port = 8090 

    context-per-jvm = false 
    context-creation-timeout = 100 s 
    # Note: JobFileDAO is deprecated from v0.7.0 because of issues in 
    # production and will be removed in future, now defaults to H2 file. 
    jobdao = spark.jobserver.io.JobSqlDAO 

    filedao { 
     rootdir = /tmp/spark-jobserver/filedao/data 
    } 

    datadao { 
     rootdir = /tmp/spark-jobserver/upload 
    } 

    sqldao { 
     slick-driver = slick.driver.H2Driver 

     jdbc-driver = org.h2.Driver 

     rootdir = /tmp/spark-jobserver/sqldao/data 

     jdbc { 
     url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db" 
     user = "" 
     password = "" 
     } 

     dbcp { 
     enabled = false 
     maxactive = 20 
     maxidle = 10 
     initialsize = 10 
     } 
    } 
    result-chunk-size = 1m 
    short-timeout = 60 s  
    } 

    context-settings { 
    num-cpu-cores = 2   # Number of cores to allocate. Required. 
    memory-per-node = 512m   # Executor memory per node, -Xmx style eg 512m, #1G, etc. 

    } 

} 

akka { 
    remote.netty.tcp { 
    # This controls the maximum message size, including job results, that can be sent 
    # maximum-frame-size = 200 MiB 
    } 
} 

# check the reference.conf in spray-can/src/main/resources for all defined settings 
spray.can.server.parsing.max-content-length = 250m 

我使用spark-2.0-preview版本。

回答

0

我以前遇到過相同的錯誤,並且與超時有關,肯定是一個syncronus請求(sync = true)togheter,您必須提供超時值(以秒爲單位),這是一個相對於處理需要多長時間的值你的申請。

爲例請求應該什麼樣子:

curl -k --basic -d '' 'http://localhost:5050/jobs?appName=app&classPath=Main&context=test-context&sync=true&timeout=40' 

,如果您的請求需要超過40秒,也許你還需要和修改位於application.conf上

spark-jobserver-master/job-server/src/main/resources/application.conf 

spray.can.server部分修改:

idle-timeout = 210 s 
request-timeout = 200 s