2016-05-24 39 views
7

我試圖用AWS中的hdfs測試spark 1.6。我正在使用示例文件夾中提供的wordcount python示例。我使用spark-submit提交作業,作業成功完成,並在控制檯上打印結果。網絡用戶界面也表示已完成。然而,spark-submit永遠不會終止。我已經驗證上下文在詞計數示例代碼中也停止了。工作完成後,spark-submit繼續掛起

什麼可能是錯的?

這就是我在控制檯上看到的。

6-05-24 14:58:04,749 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 
2016-05-24 14:58:04,749 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/json,null} 
2016-05-24 14:58:04,749 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages,null} 
2016-05-24 14:58:04,749 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 
2016-05-24 14:58:04,750 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 
2016-05-24 14:58:04,750 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 
2016-05-24 14:58:04,750 INFO [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs,null} 
2016-05-24 14:58:04,802 INFO [Thread-3] ui.SparkUI (Logging.scala:logInfo(58)) - Stopped Spark web UI at http://172.30.2.239:4040 
2016-05-24 14:58:04,805 INFO [Thread-3] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Shutting down all executors 
2016-05-24 14:58:04,805 INFO [dispatcher-event-loop-2] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Asking each executor to shut down 
2016-05-24 14:58:04,814 INFO [dispatcher-event-loop-5] spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(58)) - MapOutputTrackerMasterEndpoint stopped! 
2016-05-24 14:58:04,818 INFO [Thread-3] storage.MemoryStore (Logging.scala:logInfo(58)) - MemoryStore cleared 
2016-05-24 14:58:04,818 INFO [Thread-3] storage.BlockManager (Logging.scala:logInfo(58)) - BlockManager stopped 
2016-05-24 14:58:04,820 INFO [Thread-3] storage.BlockManagerMaster (Logging.scala:logInfo(58)) - BlockManagerMaster stopped 
2016-05-24 14:58:04,821 INFO [dispatcher-event-loop-3] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(58)) - OutputCommitCoordinator stopped! 
2016-05-24 14:58:04,824 INFO [Thread-3] spark.SparkContext (Logging.scala:logInfo(58)) - Successfully stopped SparkContext 
2016-05-24 14:58:04,827 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon. 
2016-05-24 14:58:04,828 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with flushing remote transports. 
2016-05-24 14:58:04,843 INFO [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting shut down. 

我必須做一個ctrl-c來終止spark-submit進程。這真是一個奇怪的問題,我不知道如何解決這個問題。請讓我知道是否有任何日誌我應該看,或在這裏做不同的事情。

這裏的jstack輸出引擎收錄鏈接火花提交過程: http://pastebin.com/Nfnt4XmT

+0

我不知道python,但我會去檢查哪個線程是活動的,即使Spark上下文關閉。 檢查http://stackoverflow.com/questions/4046986/python-how-to-get-the-numebr-of-active-threads-started-by-specific-class –

+0

您可能必須停止在通過執行'sc.stop()' –

+0

我已經停止了spark的上下文。這是後期。 –

回答

-2

你可以嘗試使用nohup的與您的火花提交命令,並把「&」運營商在年底,因爲據我可以從你粘貼的日誌中得到火花環境停止,唯一的問題是它沒有反映在終端中,如果我錯了,糾正我。

nohup spark-submit --master yarn --deploy-mode client --driver-memory=4G --num-executors=12 --executor-memory=4G --conf spark.yarn.driver.memoryOverhead=800 --conf spark.yarn.executor.memoryOverhead=800 --conf spark.kryoserializer.buffer.max=3G your_python_file.py > your_log_file.log &