我現在使用mongo-spark-connector_2.11-2.0.0.jar從mongodb中讀取數據,這是一個具有5個配置服務器,5個分片服務器和1個分片服務器的分片羣集mongos。我的代碼是這樣的:帶火花的mongodb有錯誤代碼-5
val rdd = MongoSpark.builder().sparkSession(spark).build.toRDD()
rdd.foreach{ x => {
try{
dosomething(x)
}catch{
case e: Throwable => e.printStackTrace()
}
}}
和我的火花配置爲:
.config("spark.cores.max", 60)
.config("spark.executor.cores", 12)
.config("spark.executor.memory", "32g")
.config("spark.mongodb.input.uri", "mongodb://192.168.12.161:27017/datab.origin2")
有集合中的27,000,000文檔,當火花應用程序啓動RDD有2500分。運行一段時間後,我得到了一個錯誤代碼-5在我的司機:
Caused by: com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 2639909050433532364 not found on server 192.168.12.161:27017' on server 192.168.12.161:27017 at com.mongodb.operation.QueryHelper.translateCommandException(QueryHelper.java:27) at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:213) at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:103) at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:918) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
我讀了工人火花日誌,這是第一次433任務開始並在第一時間就返回錯誤:
17/07/17 19:14:23 INFO CoarseGrainedExecutorBackend: Got assigned task 433
17/07/17 19:14:23 INFO Executor: Running task 433.0 in stage 0.0 (TID 433)
17/07/17 19:37:31 ERROR Executor: Exception in task 433.0 in stage 0.0 (TID 433) com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 2639909048849185072 not found on server 192.168.12.161:27017' on server 192.168.12.161:27017
而這mongs登錄:
2017-07-17T19:24:49.677+0800 I QUERY [ClusterCursorCleanupJob] Marking cursor id 2639909048849185072 for deletion, idle since 2017-07-17T19:14:46.055+0800
我要尋找的錯誤代碼-5和知道它發生時光標不是10分鐘使用,但其他分區只需要3-4分鐘,完成這個過程唱。 當我使用java驅動程序時,我可以使用noCursorTimeout()
來避免此問題,當我使用mongo-spark-connector時,如何解決此問題?或者我可以用我的分片羣來解決它?