2
使用Spark 1.1Spark - 洗牌中「太多打開的文件」
我有2個數據集。一個非常大,另一個減小(使用一些1:100濾波)到更小的範圍。我需要通過將較小列表中的項目與較大列表中的對應項目(這些列表包含具有相互連接字段的元素)連接,將大數據集縮小至相同比例。
我做的是使用下面的代碼:
- 的 「如果(joinKeys!= NULL)」 部分是有關部分
小名單 「joinKeys」 大名單「keyedEvents 「
private static JavaRDD<ObjectNode> createOutputType(JavaRDD jsonsList, final String type, String outputPath,JavaPairRDD<String,String> joinKeys) { outputPath = outputPath + "/" + type; JavaRDD events = jsonsList.filter(new TypeFilter(type)); // This is in case we need to narrow the list to match some other list of ids... Recommendation List, for example... :) if(joinKeys != null) { JavaPairRDD<String,ObjectNode> keyedEvents = events.mapToPair(new KeyAdder("requestId")); JavaRDD <ObjectNode> joinedEvents = joinKeys.join(keyedEvents).values().map(new PairToSecond()); events = joinedEvents; } JavaPairRDD<String,Iterable<ObjectNode>> groupedEvents = events.mapToPair(new KeyAdder("sliceKey")).groupByKey(); // Add convert jsons to strings and add "\n" at the end of each JavaPairRDD<String, String> groupedStrings = groupedEvents.mapToPair(new JsonsToStrings()); groupedStrings.saveAsHadoopFile(outputPath, String.class, String.class, KeyBasedMultipleTextOutputFormat.class); return events; }
事情是運行此作業時,我總是得到同樣的錯誤:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2757 in stage 13.0 failed 4 times, most recent failure: Lost task 2757.3 in stage 13.0 (TID 47681, hadoop-w-175.c.taboola-qa-01.internal): java.io.FileNotFoundException: /hadoop/spark/tmp/spark-local-20141201184944-ba09/36/shuffle_6_2757_2762 (Too many open files)
java.io.FileOutputStream.open(Native Method)
java.io.FileOutputStream.<init>(FileOutputStream.java:221)
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
我已經增加了我的ulimits,做所有集羣機器執行以下操作:
echo "* soft nofile 900000" >> /etc/security/limits.conf
echo "root soft nofile 900000" >> /etc/security/limits.conf
echo "* hard nofile 990000" >> /etc/security/limits.conf
echo "root hard nofile 990000" >> /etc/security/limits.conf
echo "session required pam_limits.so" >> /etc/pam.d/common-session
echo "session required pam_limits.so" >> /etc/pam.d/common-session-noninteractive
但不解決我的問題......
增加文件描述符限制之後,是否還重新啓動了所有spark守護進程(並且可能在運行驅動程序的節點上啓動了一個新會話),以便獲取新限制?你是否也在YARN上運行火花獨立或火花?如果在YARN上運行,重新啓動所有YARN守護進程也可能是有益的(出於同樣的原因)。 – 2014-12-01 22:04:14
我正在使用GCE,所以我每次都部署一個新的羣集。另外,在作業運行之前,ulimits的設置在集羣初始化階段完成。最後,我不使用YARN,而是使用Spark Standalone模式。 – 2014-12-01 22:42:56
提高你的'ulimit's應該可以工作,但你也可以嘗試改變'spark.shuffle.manager'到新的'SORT'管理器,根據[configuration guide here](http://spark.apache.org/docs /1.1.0/configuration.html#shuffle-behavior)。 – 2014-12-02 21:02:32