2016-01-21 41 views
0
  • 我使用以下運行時火花配置值
  • spark-submit --executor-memory 8G --spark.yarn.executor.memoryOverhead 2G火花應用java.lang.OutOfMemoryError:直接緩衝存儲器

    但它仍然提高下列出存儲器錯誤:

    我有一個pairRDD有8362269460行,分區大小是128.當pairRDD.groupByKey.saveAsTextFile .Any線索提出這個錯誤?

    更新: 我添加了一個過濾器,現在數據行是2300000000.在spark shell中運行,沒有錯誤。 我的集羣: 19 datenode 1個namdnode

       Min Resources: <memory:150000, vCores:150> 
          Max Resources: <memory:300000, vCores:300> 
    

    感謝您的幫助。

    org.apache.spark.shuffle.FetchFailedException: java.lang.OutOfMemoryError: Direct buffer memory 
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:321) 
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:306) 
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51) 
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) 
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) 
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:132) 
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60) 
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:89) 
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
        at org.apache.spark.scheduler.Task.run(Task.scala:88) 
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
        at java.lang.Thread.run(Thread.java:745) 
    Caused by: io.netty.handler.codec.DecoderException: Direct buffer memory 
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:234) 
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) 
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) 
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) 
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) 
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
        ... 1 more 
    Caused by: java.lang.OutOfMemoryError: Direct buffer memory 
        at java.nio.Bits.reserveMemory(Bits.java:658) 
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) 
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
        at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:651) 
        at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237) 
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:215) 
        at io.netty.buffer.PoolArena.reallocate(PoolArena.java:358) 
        at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:121) 
        at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) 
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) 
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) 
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) 
        at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92) 
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:228) 
        ... 10 more 
    ) 
    

    我想知道如何正確配置直接內存大小。 最好的問候

    +1

    請妥善格式化您的問題,並給予一定的情況下將其 – manRo

    +0

    @ssyue -XX:MaxDirectMemorySize –

    +0

    @曼羅抱歉,英文是我的弱點。 – ssyue

    回答

    2

    我不知道火花應用程序的任何細節,但我覺得內存配置here 需要設置-XX:MaxDirectMemorySize任何其他JVM MEM相似。設定(通過-XX :) 嘗試使用spark.executor.extraJavaOptions

    如果您正在使用​​你可以使用:

    ./bin/spark-submit --name "My app" ... 
        --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxDirectMemorySize=512m" myApp.jar 
    
    +0

    但這個內存錯誤相當意味着你的應用程序有任何內存問題,例如你讀取整個流內容到內存緩衝區 –

    +0

    我會試一試。謝謝,謝謝, – ssyue

    +0

    不行,其他解決方案? – ssyue