2015-07-28 88 views
2

我們正在使用Spark(使用PySpark),並且在具有Java版本「1.8.0_45」的Ubuntu Server 14.04 LTS虛擬機的VMware ESX 5.5環境中遇到問題。運行簡單的PySpark示例失敗

運行一個簡單的sc.parallelize(['2', '4']).collect()結果是:

15/07/28 10:11:42 INFO SparkContext: Starting job: collect at <stdin>:1 
15/07/28 10:11:42 INFO DAGScheduler: Got job 0 (collect at <stdin>:1) with 2 output partitions (allowLocal=false) 
15/07/28 10:11:42 INFO DAGScheduler: Final stage: ResultStage 0(collect at <stdin>:1) 
15/07/28 10:11:42 INFO DAGScheduler: Parents of final stage: List() 
15/07/28 10:11:42 INFO DAGScheduler: Missing parents: List() 
15/07/28 10:11:42 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:396), which has no missing parents 
15/07/28 10:11:42 INFO TaskSchedulerImpl: Cancelling stage 0 
15/07/28 10:11:42 INFO DAGScheduler: ResultStage 0 (collect at <stdin>:1) failed in Unknown s 
15/07/28 10:11:42 INFO DAGScheduler: Job 0 failed: collect at <stdin>:1, took 0,058933 s 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/opt/spark/spark/python/pyspark/rdd.py", line 745, in collect 
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) 
    File "/opt/spark/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ 
    File "/opt/spark/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. 
: org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.reflect.InvocationTargetException 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) 
org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60) 
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) 
org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80) 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) 
org.apache.spark.SparkContext.broadcast(SparkContext.scala:1289) 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874) 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815) 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799) 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419) 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) 
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 

    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256) 
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:884) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815) 
    at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411) 
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 

找到有關相同的行爲這個問題:https://issues.apache.org/jira/browse/SPARK-9089

發生了什麼事的任何想法?或者我們可以嘗試什麼?

回答

1

正如問題說:

我們面臨着同樣的問題和挖掘,並有很多的運氣 後,我們已經找到了問題的根源。

這是由於snappy-java將本地庫提取爲 java.io.tempdir(默認爲/ tmp)並將可執行標誌設置爲 提取的文件而引起的。如果使用「noexec」選項掛載/ tmp, snappy-java將無法設置可執行標誌,並會引發異常。請參閱SnappyLoader.java代碼。

我們在安裝 /tmp時未使用「noexec」選項修復了此問題。

肖恩·歐文。如果你想重現該問題,安裝/ tmp目錄「NOEXEC」 選項或java.io.tempdir設置與安裝有 「NOEXEC」的目錄。

也許這將是更好的火花設置屬性 org.xerial.snappy.tempdir到spark.local.dir的價值,但沒有 防止spark.local.dir可安裝爲「NOEXEC」還。

/tmp掛載點中刪除noexec解決了此問題。