WARN ReliableDeliverySupervisor：與遠程系統的關聯失敗，地址現在被控制爲[5000] ms。原因：[解除關聯]

我運行在AWS上如下語句引發WARN ReliableDeliverySupervisor：與遠程系統的關聯失敗，地址現在被控制爲[5000] ms。原因：[解除關聯]

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
import sqlContext.implicits._ 

case class Wiki(project: String, title: String, count: Int, byte_size: String) 

val data = sc.textFile("s3n://+++/").map(_.split(" ")).filter(_.size ==4).map(p => Wiki(p(0), p(1), p(2).trim.toInt, p(3))) 

val df = data.toDF() 
df.printSchema() 

val en_agg_df = df.filter("project = 'en'").select("title","count").groupBy("title").sum().collect()

可以約2小時運行出現以下錯誤後：

WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:42514] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/10/15 17:38:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 172.31.14.190:42514 
15/10/15 17:38:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 172.31.14.190:42514 
15/10/15 17:38:36 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:43340] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/10/15 17:38:36 ERROR YarnScheduler: Lost executor 1 on ip-172-31-14-190.ap-northeast-1.compute.internal: remote Rpc client disassociated 
15/10/15 17:38:36 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 
15/10/15 17:38:36 WARN TaskSetManager: Lost task 4736.0 in stage 0.0 (TID 4736, ip-172-31-14-190.ap-northeast-1.compute.internal): ExecutorLostFailure (executor 1 lost) 
15/10/15 17:38:36 INFO DAGScheduler: Executor lost: 1 (epoch 0) 
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-172-31-14-190.ap-northeast-1.compute.internal, 58890) 
15/10/15 17:38:36 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor 
15/10/15 17:38:36 ERROR YarnScheduler: Lost executor 2 on ip-172-31-14-190.ap-northeast-1.compute.internal: remote Rpc client disassociated 
15/10/15 17:38:36 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:60961] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 
15/10/15 17:38:36 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0 
15/10/15 17:38:36 WARN TaskSetManager: Lost task 4735.0 in stage 0.0 (TID 4735, ip-172-31-14-190.ap-northeast-1.compute.internal): ExecutorLostFailure (executor 2 lost) 
15/10/15 17:38:36 INFO DAGScheduler: Executor lost: 2 (epoch 0) 
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 
15/10/15 17:38:36 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, ip-172-31-14-190.ap-northeast-1.compute.internal, 58811) 
15/10/15 17:38:36 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor

是什麼意思呢？我該如何解決它？

來源

2015-10-15 Hello lad

執行人可能跑出內存。因此，您需要檢查丟失的執行程序的容器日誌，並可能檢查其運行的節點上的yarn nodemanager日誌。 – ChristopherB

@ChristopherB thx很多評論！ –

它工作或找到更多的錯誤信息？ – ChristopherB

答案已經出現在註釋中提供：

It seems to be a out-of-memory on executor, because it went well if I add more machine to the cluster

來源

2017-08-08 13:45:44

WARN ReliableDeliverySupervisor：與遠程系統的關聯失敗，地址現在被控制爲[5000] ms。原因：[解除關聯]

回答

相關問題