0
我試圖運行Python編寫一個簡單的火花流工作:星火流:java.lang.OutOfMemoryError:Java堆空間
#!/usr/bin/env python
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
conf = SparkConf()
conf.setMaster("spark://master1:7077,master2:7077")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 1)
ssc.socketTextStream("master1", 9999).count().pprint()
ssc.start()
ssc.awaitTermination()
一對夫婦的運行秒鐘後,任務將失敗。這裏是我看到的異常:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
at com.esotericsoftware.kryo.io.Output.require(Output.java:135)
at com.esotericsoftware.kryo.io.Output.writeString_slow(Output.java:420)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:326)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:158)
at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153)
at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1190)
at org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1199)
at org.apache.spark.storage.MemoryStore.putArray(MemoryStore.scala:132)
at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:169)
at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:143)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:791)
at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
at org.apache.spark.streaming.receiver.BlockManagerBasedBlockHandler.storeBlock(ReceivedBlockHandler.scala:77)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushAndReportBlock(ReceiverSupervisorImpl.scala:156)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushArrayBuffer(ReceiverSupervisorImpl.scala:127)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$3.onPushBlock(ReceiverSupervisorImpl.scala:108)
at org.apache.spark.streaming.receiver.BlockGenerator.pushBlock(BlockGenerator.scala:294)
at org.apache.spark.streaming.receiver.BlockGenerator.org$apache$spark$streaming$receiver$BlockGenerator$$keepPushingBlocks(BlockGenerator.scala:266)
at org.apache.spark.streaming.receiver.BlockGenerator$$anon$1.run(BlockGenerator.scala:108)
之後開始一個新任務,所以作業繼續運行。但是,我想知道,我錯過了什麼。
UPDATE
火花defaults.conf
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 4g
spark.executor.memory 4g
spark.executor.extraJavaOptions -XX:+PrintGCDetails
spark.deploy.recoveryMode ZOOKEEPER
spark.deploy.zookeeper.url master1:2181,master2:2181,master3:2181
在哪裏出現呢?驅動程序還是執行程序?似乎你需要增加你的執行者的記憶。另外請提及您的羣集配置。 – Sumit
我在執行者看到這個例外。每個執行者都有4Gb的RAM。我更新了發佈我的spark-defaults.conf的問題 – facha
Streams在每個批處理中接收的數據的大小/類型是什麼?如果您已經捕獲了GC日誌,那也是這篇文章。你的程序很簡單,但看起來好像數據接收數據的速度太高。你有沒有在Spark-UI中看到任何Backlog等任務 – Sumit