2015-11-09 28 views
2

我有cassandra 2.1.8集羣,有16個節點(Centos 6.6,1x4core xeon,32Gb RAM,3x3Tb HDD,java 1.8.0_65)並試圖添加16個,一個接一個,但堅持第一個。cassandra 2.1.8,添加新節點,內存不足

在新節點上開始卡桑德拉處理之後,從以前存在的結點16流至新添加的節點開始:

nodetool netstats |grep Already 
Receiving 131 files, 241797656689 bytes total. Already received 100 files, 30419228367 bytes total 
Receiving 150 files, 227954962242 bytes total. Already received 116 files, 29078363255 bytes total 
Receiving 127 files, 239902942980 bytes total. Already received 103 files, 29680298986 bytes total 
    ... 

新節點處於「連接」狀態(最後一行):

UN ...70 669.64 GB 256 ? a9c8adae-e54e-4e8e-a333-eb9b2b52bfed R0  
UN ...71 638.09 GB 256 ? 6aa8cf0c-069a-4049-824a-8359d1c58e59 R0  
UN ...80 667.07 GB 256 ? 7abb5609-7dca-465a-a68c-972e54469ad6 R1 
UJ ...81 102.99 GB 256 ? c20e431e-7113-489f-b2c3-559bbd9916e2 R2 

在加盟長相正常的幾個小時的過程,但在那之後的新節點上的卡桑德拉過程與OOM異常死亡:

ERROR 09:07:37 Exception in thread Thread[Thread-1822,5,main] 
java.lang.OutOfMemoryError: Java heap space 
     at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65] 
java.lang.OutOfMemoryError: Java heap space 
     at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) 
     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
     at java.lang.Thread.run(Thread.java:745) 
java.lang.OutOfMemoryError: Java heap space 
     at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) 
     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
     at java.lang.Thread.run(Thread.java:745) 
java.lang.OutOfMemoryError: Java heap space 
     at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) 
     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
     at java.lang.Thread.run(Thread.java:745) 

我做了6次或7次嘗試,CMS和G1 GC,MAX_HEAP_SIZE從8G(默認)到16G,沒有運氣。這似乎卡桑德拉抓OOM由於不同模式的地方上堆出:

RROR [CompactionExecutor:6] 2015-11-08 04:42:24,277 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:6,1,main] 
java.lang.OutOfMemoryError: Java heap space 
     at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:75) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:70) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:48) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1822) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1967) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1810) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:90) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:38) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:155) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:144) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:427) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at org.apache.cassandra.db.compaction.CompactionManager$10.run(CompactionManager.java:1144) ~[apache-cassandra-2.1.8.jar:2.1.8] 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65] 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65] 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65] 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65] 
     at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65] 

Futher MAX_HEAP_SIZE的擴大導致了卡桑德拉的死亡系統的OOM殺手。

任何想法?

+0

您是否在新節點上使用某種形式的行緩存? –

+0

不,它關閉 –

回答

0

我遇到了完全相同的問題(請參閱my JIRA ticket),它似乎與有很多墓碑的表(大小分層的壓縮通常不能很好地清理它們)有關。一個潛在的分流措施是簡單地重新啓動節點auto_bootstrap設置爲false,然後運行nodetool rebuild完成該過程。這將導致現有數據被保留,同時允許新節點提供流量。

但是,您可能仍然存在導致OOM的潛在問題。流會話(顯然)期間出現非常大的被物化到內存中,並且很可能之一:

  1. 一個非常大的分區,可意外發生。檢查cfstats並查看最大分區字節。如果是這種情況,則需要處理根數據模型問題並清理該數據。

  2. 很多墓碑。您應該在日誌中看到關於此的警告。

如果您確實遇到過這些問題之一,您幾乎可以肯定必須先解決它,然後才能成功進行流式傳輸。

+0

感謝非常有幫助的答案,我想我找到了原因,有「壓縮分區最大字節數:6GB」的二級索引,看起來像一個候選人。我會盡力放棄它並重復嘗試。 –

+0

看來這是網絡問題,我已經升級了新節點上的內核和eth(igb)驅動程序,而且oom已被其他[問題]所取代(http://stackoverflow.com/questions/33869558/cassandra-secondary-indexes路技術-期間-加入-的新節點-持續-永遠) –