2017-08-02 84 views
0

我正在使用spark 2.1.0。我有2個數據幀不超過3 MB。當我試圖在2個數據框上運行內部連接時,我所有的轉換邏輯都可以完美地工作。但是,當我使用RightOuter加入2個數據框時,出現以下錯誤。連接條件下Pyspark內存問題

錯誤

RN for exceeding memory limits. 1.5 GB of 1.5 GB physical memory used. 
Consider boosting spark.yarn.executor.memoryOverhead. 
17/08/02 02:29:53 ERROR cluster.YarnScheduler: Lost executor 337 on ip-172- 
21-1-105.eu-west-1.compute.internal: Container killed by YARN for exceeding 
memory limits. 1.5 GB of 1.5 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead. 
17/08/02 02:29:53 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 
283.0 (TID 11396, ip-172-21-1-105.eu-west-1.compute.internal, executor 337): 
ExecutorLostFailure (executor 337 exited caused by one of the running tasks) 
Reason: Container killed by YARN for exceeding memory limits. 1.5 GB of 1.5 
GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead. 
17/08/02 02:29:53 WARN server.TransportChannelHandler: Exception in 
connection from /172.21.1.105:50342 
java.io.IOException: Connection reset by peer 

我試着用替代 1)df.coalesce(x值).show() 2)嘗試設置執行內存毫無效果。

此問題在過去幾周內未解決。任何人都可以請讓我知道我錯在哪裏

回答

0

請你分享有關數據集的細節。

  1. 兩個數據集中有多少個行和列。

你試過leftOuterJoin嗎,它是否也給你同樣的錯誤。

問候,

Neeraj