2015-12-29 133 views
1

我一直在使用Spark遇到過一個又一個問題,我相信它有一些與網絡或權限或兩者兼有的問題。主或日誌中沒有任何內容或拋出的錯誤會提示問題。Apache Spark Worker Timeout

15/12/29 19:19:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:20:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/8 is now EXITED (Command exited with code 1) 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Executor app-20151229141057-0000/8 removed: Command exited with code 1 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 8 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor added: app-20151229141057-0000/10 on worker-20151229141026-127.0.0.1-48818 (127.0.0.1:48818) with 2 cores 
15/12/29 19:21:11 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151229141057-0000/10 on hostPort 127.0.0.1:48818 with 2 cores, 1024.0 MB RAM 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/10 is now LOADING 
15/12/29 19:21:11 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/10 is now RUNNING 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/9 is now EXITED (Command exited with code 1) 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Executor app-20151229141057-0000/9 removed: Command exited with code 1 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 9 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor added: app-20151229141057-0000/11 on worker-20151229141023-127.0.0.1-35452 (127.0.0.1:35452) with 2 cores 
15/12/29 19:21:12 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151229141057-0000/11 on hostPort 127.0.0.1:35452 with 2 cores, 1024.0 MB RAM 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/11 is now LOADING 
15/12/29 19:21:12 INFO AppClient$ClientEndpoint: Executor updated: app-20151229141057-0000/11 is now RUNNING 
15/12/29 19:21:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 

我試圖在spark 14.2上運行Ubuntu 14.04的獨立安裝程序。一切似乎都正確配置,但工作似乎從未完成,每個員工都超時。

enter image description here

這是遠程機器從我上執行工作的一個...

enter image description here

的代碼只是他們的一個例子。我也嘗試過Pi Estimation的例子,並且有同樣的問題。

def main(args: Array[String]) { 
    val logFile = "/Users/user/spark.txt" // Should be some file on your system 
    val conf = new SparkConf().setAppName("Simple App").setMaster("spark://46.101.xxx.xxx:7077") 
    val sc = new SparkContext(conf) 
    val logData = sc.textFile(logFile, 2).cache() 
    val numAs = logData.filter(line => line.contains("a")).count() 
    val numBs = logData.filter(line => line.contains("b")).count() 
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) 
} 

有沒有人遇到過這個問題?如果有人能幫我解決這個問題,我會非常感激。

--edit - 附加信息。

#spark-env.sh 
export SPARK_LOCAL_IP="46.101.xxx.xxx" 
export SPARK_MASTER_IP="46.101.xxx.xxx" 
export SPARK_PUBLIC_DNS="46.101.xxx.xxx" 

試過的Java 7 &爪哇8使用Scala 2.10.6和2.11.latest。

法師開始./start-master.sh 工人開始./start-slave.sh火花://46.101.xxx.xxx:7077

運行在Ubuntu 14.04.3 LTS。 (數字海洋) - 沒有防火牆。可以從遠程機器遠程登錄到主人和工作人員。主人和工人都在同一臺機器上。

測試了Spark 1.5.2和1.5.0。客戶機(請求)和遠程服務器(主機和工作人員)之間的Java,Scala和Spark版本保持一致。

回答

0

它看起來像你的應用程序找不到工人。當你啓動集羣時,你是否啓動了任何從設備並將它們連接到主設備上?

要開始你的工人,並將它們連接到主,運行以下命令:

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://ip:port 

火花:// IP:端口是主的。

+0

您可以在上面的屏幕截圖中看到連接到主人的兩名工作人員。我只是運行'./start-all.sh'命令。手動對'。/ start-master.sh'和'。/ start-worker.sh'做同樣的效果。 –