卡桑德拉缺乏可擴展性

我有一個可擴展性卡桑德拉數據庫的問題。儘管節點數量從2個增加到8個，但數據庫的性能不會增長。卡桑德拉缺乏可擴展性

Cassandra Version: 3.7 
Cassandra Hardware x8: 1vCPU 2.5 Ghz, 900 MB RAM, SSD DISK 20GB, 10 Gbps LAN 
Benchmark Hardware x1: 16vCPU 2.5 GHz, 8 GB RAM, SSD DISK 5GB, 10 Gbps LAN

默認設置在cassandra.yaml被改變：

cluster_name: 'tst' 
seeds: "192.168.0.101,192.168.0.102,...108" 
listen_address: 192.168.0.xxx 
endpoint_snitch: GossipingPropertyFileSnitch 
rpc_address: 192.168.0.xxx 
concurrent_reads: 8 
concurrent_writes: 8 
concurrent_counter_writes: 8

KEYSPACE：

create keyspace tst WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '2' };

實施例的表：

CREATE TABLE shares (
    c1 int PRIMARY KEY, 
    c2 varchar, 
    c3 int, 
    c4 int, 
    c5 int, 
    c6 varchar, 
    c7 int 
);

在試驗中使用實施例一的查詢：

INSERT INTO shares (c1, c1, c3, c4, c5, c6, c7) VALUES (%s, '%s', %s, %s, %s, '%s', %s)

對於與基地連接，我會使用https://github.com/datastax/java-driver。在多線程中，根據指令使用羣集對象和會話對象之一。連接：查詢

PoolingOptions poolingOptions = new PoolingOptions(); 
poolingOptions.setConnectionsPerHost(HostDistance.LOCAL, 5, 300); 
poolingOptions.setCoreConnectionsPerHost(HostDistance.LOCAL, 10); 
poolingOptions.setPoolTimeoutMillis(5000); 
QueryOptions queryOptions = new QueryOptions(); 
queryOptions.setConsistencyLevel(ConsistencyLevel.QUORUM); 

Builder builder = Cluster.builder(); 
builder.withPoolingOptions(poolingOptions); 
builder.withQueryOptions(queryOptions); 
builder.withLoadBalancingPolicy(new RoundRobinPolicy()); 
this.setPoints(builder); // here all of the nodes are added 
Cluster cluster = builder.build()

代碼：

public ResultSet execute(String query) { 
ResultSet result = this.session.execute(query); 
return result; 
}

在測試工作中，使用的所有節點上的內存爲80％，CPU 100％。我在監視器使用連接驚訝（過低）：

[2016-09-10 09:39:51.537] /192.168.0.102:9042 connections=10, current load=62, max load=10240 
[2016-09-10 09:39:51.556] /192.168.0.103:9042 connections=10, current load=106, max load=10240 
[2016-09-10 09:39:51.556] /192.168.0.104:9042 connections=10, current load=104, max load=10240 
[2016-09-10 09:39:51.556] /192.168.0.101:9042 connections=10, current load=196, max load=10240 
[2016-09-10 09:39:56.467] /192.168.0.102:9042 connections=10, current load=109, max load=10240 
[2016-09-10 09:39:56.467] /192.168.0.103:9042 connections=10, current load=107, max load=10240 
[2016-09-10 09:39:56.467] /192.168.0.104:9042 connections=10, current load=115, max load=10240 
[2016-09-10 09:39:56.468] /192.168.0.101:9042 connections=10, current load=169, max load=10240 
[2016-09-10 09:40:01.468] /192.168.0.102:9042 connections=10, current load=113, max load=10240 
[2016-09-10 09:40:01.468] /192.168.0.103:9042 connections=10, current load=84, max load=10240 
[2016-09-10 09:40:01.468] /192.168.0.104:9042 connections=10, current load=92, max load=10240 
[2016-09-10 09:40:01.469] /192.168.0.101:9042 connections=10, current load=205, max load=10240

碼顯示器：https://github.com/datastax/java-driver/tree/3.0/manual/pooling#monitoring-and-tuning-the-pool

我想測試一些NoSQL數據庫的可擴展性。在Redis基礎中，它是線性可伸縮性，在這裏她根本不是，我不知道爲什麼。謝謝你的幫助！

來源

2016-09-10 Sannin

你有什麼樣的值你的分區密鑰？數據分發情況如何？ Cassandra通過計算主鍵上的散列來分發數據。如果您的所有數據都有少量的PK值，那麼使用多少個服務器並不重要。 – riwalk

每臺機器上的1GB RAM是非常低的目標。這可能會造成太多的GC壓力。檢查日誌以查看GC活動，並嘗試瞭解此100％CPU上限是否始終歸因於JVM GC。

另一個怪癖：你在每臺機器上運行多少個線程？如果您嘗試使用此代碼（代碼）規模：

查詢代碼：

public ResultSet execute(String query) { 
ResultSet result = this.session.execute(query); 
return result; 
}

，那麼你不會走得很遠。同步查詢毫無希望地緩慢。即使你嘗試使用更多的線程，那麼1GB的內存可能（我已經知道它是...）太低了...你應該寫資源消耗和可伸縮性的異步查詢。

來源

2016-09-10 12:26:05 xmas79

謝謝！我在基準測試中使用1000個線程。如果我使用異步連接，我將如何在部分時間檢查querys的值？ – Sannin

1000線程可能太多了...堅持使用2個vCPU線程（在你的情況下是32個線程）並進行異步路由。稍後再增加它們。通常你會收集一個ResultSetFuture期貨對象列表。當你收集**機上查詢你的**最大數量（使他們最初1000），你等待他們全部完成，所以你應用一些反壓力，不要把你的集羣下stress.You還可以在註冊回調期貨如果您喜歡這種風格，請參閱http://www.datastax.com/dev/blog/java-driver-async-queries作爲示例。 – xmas79

卡桑德拉缺乏可擴展性

回答

相關問題