2017-07-19 24 views
0

我現在使用mongo-spark-connector_2.11-2.0.0.jar從mongodb中讀取數據,這是一個具有5個配置服務器,5個分片服務器和1個分片服務器的分片羣集mongos。我的代碼是這樣的:帶火花的mongodb有錯誤代碼-5

val rdd = MongoSpark.builder().sparkSession(spark).build.toRDD() 
rdd.foreach{ x => { 
    try{ 
     dosomething(x) 
    }catch{ 
     case e: Throwable => e.printStackTrace() 
    } 
}} 

和我的火花配置爲:

.config("spark.cores.max", 60)  
.config("spark.executor.cores", 12) 
.config("spark.executor.memory", "32g") 
.config("spark.mongodb.input.uri", "mongodb://192.168.12.161:27017/datab.origin2") 

有集合中的27,000,000文檔,當火花應用程序啓動RDD有2500分。運行一段時間後,我得到了一個錯誤代碼-5在我的司機:

Caused by: com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 2639909050433532364 not found on server 192.168.12.161:27017' on server 192.168.12.161:27017 at com.mongodb.operation.QueryHelper.translateCommandException(QueryHelper.java:27) at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:213) at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:103) at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

我讀了工人火花日誌,這是第一次433任務開始並在第一時間就返回錯誤:

17/07/17 19:14:23 INFO CoarseGrainedExecutorBackend: Got assigned task 433

17/07/17 19:14:23 INFO Executor: Running task 433.0 in stage 0.0 (TID 433)

17/07/17 19:37:31 ERROR Executor: Exception in task 433.0 in stage 0.0 (TID 433) com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 2639909048849185072 not found on server 192.168.12.161:27017' on server 192.168.12.161:27017

而這mongs登錄:

2017-07-17T19:24:49.677+0800 I QUERY [ClusterCursorCleanupJob] Marking cursor id 2639909048849185072 for deletion, idle since 2017-07-17T19:14:46.055+0800

我要尋找的錯誤代碼-5和知道它發生時光標不是10分鐘使用,但其他分區只需要3-4分鐘,完成這個過程唱。 當我使用java驅動程序時,我可以使用noCursorTimeout()來避免此問題,當我使用mongo-spark-connector時,如何解決此問題?或者我可以用我的分片羣來解決它?

回答

0

當我在配置spark.master=local[16]的本地運行spark時,我得到了同樣的錯誤。我花了很多時間在互聯網上尋找解決方案,但沒有發現。最後,我試圖設置spark.master=local[1],它工作!