我有一個包含四個節點的cassandra 2.0.6集羣。卡桑德拉遭遇不一致問題。我使用nodetool狀態來檢查每個節點上的狀態。結果不一致。除此狀態命令運行速度非常慢。以下是每個節點上的命令結果。Cassandra節點工具狀態在具有太多待定壓縮任務的不同節點上不一致
具有ip 192.168.148.181和192.168.148.121的節點是種子節點。集羣從未運行過修復。
此外,181和121上的CPU使用率非常高,並且日誌顯示CMS GC在這些節點上非常頻繁。我斷開了所有客戶端,並且沒有讀取和寫入負載。這種一致性和高GC仍然存在。
那麼如何調試和優化這個集羣呢?
[[email protected] apache-cassandra-2.0.16]$ time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.148.121 10.86 GB 1 25.0% 1d9ba597-c404-481f-af2b-436493c57227 RAC2
UN 192.168.148.181 10.53 GB 1 25.0% 5d90300f-2fb4-4065-9819-10ece285223d RAC1
DN 192.168.148.182 10.95 GB 1 25.0% bcb550df-9429-4cae-9fd2-0bfeea9a5649 RAC4
UN 192.168.148.221 10.49 GB 1 25.0% 6867f8b4-1f54-48fc-aaae-da71bc251970 RAC3
real 8m50.506s
user 39m48.718s
sys 76m48.566s
--------------------------------------------------------------------------------
[[email protected] apache-cassandra-2.0.16]$ time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 192.168.148.121 10.86 GB 1 25.0% 1d9ba597-c404-481f-af2b-436493c57227 RAC2
UN 192.168.148.181 10.53 GB 1 25.0% 5d90300f-2fb4-4065-9819-10ece285223d RAC1
DN 192.168.148.182 10.95 GB 1 25.0% bcb550df-9429-4cae-9fd2-0bfeea9a5649 RAC4
UN 192.168.148.221 10.49 GB 1 25.0% 6867f8b4-1f54-48fc-aaae-da71bc251970 RAC3
real 0m15.075s
user 0m1.606s
sys 0m0.393s
----------------------------------------------------------------------
[[email protected] apache-cassandra-2.0.16]$ time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 192.168.148.121 10.86 GB 1 25.0% 1d9ba597-c404-481f-af2b-436493c57227 RAC2
UN 192.168.148.181 10.53 GB 1 25.0% 5d90300f-2fb4-4065-9819-10ece285223d RAC1
UN 192.168.148.182 10.95 GB 1 25.0% bcb550df-9429-4cae-9fd2-0bfeea9a5649 RAC4
UN 192.168.148.221 10.49 GB 1 25.0% 6867f8b4-1f54-48fc-aaae-da71bc251970 RAC3
real 0m25.719s
user 0m2.152s
sys 0m1.228s
-------------------------------------------------------------------------
[[email protected] apache-cassandra-2.0.16]$ time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 192.168.148.121 10.86 GB 1 25.0% 1d9ba597-c404-481f-af2b-436493c57227 RAC2
DN 192.168.148.181 10.53 GB 1 25.0% 5d90300f-2fb4-4065-9819-10ece285223d RAC1
UN 192.168.148.182 10.95 GB 1 25.0% bcb550df-9429-4cae-9fd2-0bfeea9a5649 RAC4
DN 192.168.148.221 10.49 GB 1 25.0% 6867f8b4-1f54-48fc-aaae-da71bc251970 RAC3
real 0m17.581s
user 0m1.843s
sys 0m1.632s
我打印GC的對象的詳細信息:
num #instances #bytes class name
----------------------------------------------
1: 58584535 1874705120 java.util.concurrent.FutureTask
2: 58585802 1406059248 java.util.concurrent.Executors$RunnableAdapter
3: 58584601 1406030424 java.util.concurrent.LinkedBlockingQueue$Node
4: 58584534 1406028816 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask
5: 214682 24087416 [B
6: 217294 10430112 java.nio.HeapByteBuffer
7: 37591 5977528 [C
8: 41843 5676048 <constMethodKlass>
9: 41843 5366192 <methodKlass>
10: 4126 4606080 <constantPoolKlass>
11: 100060 4002400 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
12: 4126 2832176 <instanceKlassKlass>
13: 4880 2686216 [J
14: 3619 2678784 <constantPoolCacheKlass>
我一個節點上使用nodetool cfstats
,發現不少compactions任務都在3天內已累計(我重新啓動集羣4天前)
[[email protected] apache-cassandra-2.0.16]$ bin/nodetool compactionstats
pending tasks: 64642341
Active compaction remaining time : n/a
我檢查了compactionhistory。這是結果的一部分。它顯示了很多與按鍵空間系統相關的記錄。
Compaction History:
id keyspace_name columnfamily_name compacted_at bytes_in bytes_out rows_merged
8e4f8830-b04f-11e5-a211-45b7aa88107c system sstable_activity 1451629144115 3342 915 {4:23}
96a6fcb0-b04b-11e5-a211-45b7aa88107c system hints 145162744{1:1}
7c42c940-adac-11e5-8bd4-45b7aa88107c system hints 1451339203540 56969835 56782732 {2:3}
585b97a0-ad98-11e5-8bd4-45b7aa88107c system sstable_activity 1451330553370 3700 956 {4:24}
aefc3f10-b1b2-11e5-a211-45b7aa88107c system sstable_activity 1451781670273 3201 906 {4:23}
1e76f1b0-b180-11e5-a211-45b7aa88107c system sstable_activity 1451759952971 3303 700 {4:23}
e7b75b70-aec2-11e5-8bd4-45b7aa88107c system hints 1451458783911 57690316 57497847 {2:3}
ad102280-af6d-11e5-b1dc-45b7aa88107c webtrn_study_log_formallySCORM_STU_COURSE 1451532129448 45671877 41137664 {1:11, 3:1, 4:8}
60906970-aec7-11e5-8bd4-45b7aa88107c system sstable_activity 1451460704647 3751 974 {4:25}
88aed310-ad91-11e5-8bd4-45b7aa88107c system hints 1451327627969 56984347 56765328 {2:3}
3ad14f00-af6d-11e5-b1dc-45b7aa88107c webtrn_study_log_formallySCORM_STU_COURSE 1451531937776 46696097 38827028 {1:8, 3:2, 4:9}
84df8fb0-b00f-11e5-a211-45b7aa88107c system hints 1451601640491 18970740 18970740 {1:1}
657482e0-ad33-11e5-8bd4-45b7aa88107c system sstable_activity 1451287196174 3701 931 {4:24}
9cc8af70-b24a-11e5-a211-45b7aa88107c system sstable_activity 1451846923239 3134 773 {4:23}
dcbe5e30-afd0-11e5-a211-45b7aa88107c system sstable_activity 1451574729619 3357 790 {4:23}
b285ced0-afa0-11e5-84e3-45b7aa88107c system hints 1451554042941 43310718 42137761 {1:1, 2:2}
119770e0-ad4e-11e5-8bd4-45b7aa88107c system hints 1451298651886 57397441 57190519 {2:3}
f1bb37a0-b204-11e5-a211-45b7aa88107c system hints 1451817000986 17713746
我試着用高gc刷新節點,但是它在讀取超時時返回失敗。
集羣只接收要插入的數據。我關閉客戶端寫入並在這3天內重新啓動羣集。壓縮任務仍在積累。
嗨,我更新了問題描述。它顯示了許多正在進行的壓實任務。 – chenatu
我首先看到的是系統密鑰空間中提示的存在。這可能是造成你麻煩的原因或後果。取消暗示的切換以獲得更清晰的圖像。在cassandra.yaml文件中設置'hinted_handoff_enabled:false'。再試一次。如果問題依然存在,您應該查看壓實參數。 'nodetool getcompactionthroughput'的輸出是什麼? – DineMartine
閱讀[本文檔](https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_configure_compaction_t。html)來幫助你配置壓縮。 – DineMartine