我剛開始嘗試使用Cassandra,並使用C#和DataStax driver(v 3.0.8)。我想做一些性能測試,看看Cassandra處理時間序列數據的速度有多快。Cassandra DataStax驅動程序很慢?
結果是磕磕絆絆,它需要一個永恆的SELECT
。所以我想我做錯了什麼。
我已經安裝卡桑德拉我的本地計算機上和我創建了一個表:
CREATE KEYSPACE dm WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE dm.daily_data_by_day (
symbol text,
value_type int,
as_of_day date,
revision_timestamp_utc timestamp,
value decimal,
PRIMARY KEY ((symbol, value_type), as_of_day, revision_timestamp_utc)
) WITH CLUSTERING ORDER BY (as_of_day ASC, revision_timestamp_utc ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
我已經填寫此表約15萬行,分爲約10000分區,每個分區包含多達10000行。
以下是我正在運行測試(更新on request by phact):
[Test]
public void SelectPerformance()
{
_cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
_stopwatch = new Stopwatch();
var items = new[]
{
// 20 different items...
};
foreach (var item in items)
{
var watch = Stopwatch.StartNew();
var rows = ExecuteQuery(item.Symbol, item.FieldType, item.StartDate, item.EndDate);
watch.Stop();
Console.WriteLine($"{watch.ElapsedMilliseconds}\t{rows.Length}");
}
Console.WriteLine($"Average Execute: {_stopwatch.ElapsedMilliseconds/items.Length}");
_cluster.Dispose();
}
private Row[] ExecuteQuery(string symbol, int fieldType, LocalDate startDate, LocalDate endDate)
{
using (var session = _cluster.Connect("dm"))
{
var ps = session.Prepare(
@"SELECT
symbol,
value_type,
as_of_day,
revision_timestamp_utc,
value
FROM
daily_data_by_day
WHERE
symbol = ? AND
value_type = ? AND
as_of_day >= ? AND as_of_day < ?");
var statement = ps.Bind(symbol, fieldType, startDate, endDate);
statement.EnableTracing();
_stopwatch.Start();
var rowSet = session.Execute(statement);
_stopwatch.Stop();
return rowSet.ToArray();
}
}
秒錶告訴我,session.Execute()
需要20-30毫秒執行(更新:修改代碼來創建羣集只有一次後,我下降到大約15毫秒)。所以我啓用了一些跟蹤,得到了以下結果:
activity | source_elapsed
--------------------------------------------------------------------------------------------
Parsing SELECT symbol, value_type, as_of_day, revision_timestamp_utc,...; | 47
Preparing statement | 98
Executing single-partition query on daily_data_by_day | 922
Acquiring sstable references | 939
Skipped 0/5 non-slice-intersecting sstables, included 0 due to tombstones | 978
Bloom filter allows skipping sstable 74 | 1003
Bloom filter allows skipping sstable 75 | 1015
Bloom filter allows skipping sstable 72 | 1024
Bloom filter allows skipping sstable 73 | 1032
Key cache hit for sstable 63 | 1043
Merged data from memtables and 5 sstables | 1329
Read 100 live and 0 tombstone cells | 1353
如果我正確理解這一點痕跡,卡桑德拉花費小於1.4毫秒執行我的查詢。那麼DataStax驅動程序在剩下的時間裏做了什麼?
(作爲參考,我也做了同樣的性能測試對造成約1-2毫秒執行從C#相同的查詢本地SQL Server實例。)
更新:
我試圖做一些分析,這是不是很容易與異步代碼,你不擁有...
我的結論是,大部分時間是花費解析響應。每個響應包含2000 - 3000行之間,解析每個響應大約需要9毫秒。反序列化需要大部分時間,大約6.5毫秒,小數點是最差的,每場大約3毫秒。其他字段(text,int,date和timestamp)每個字段大約需要0.5 ms。
看着我的測量時間,我應該懷疑這一點:響應中的行越多,需要的時間越長,並且幾乎是線性的。
你在本地卡桑德拉ENV執行這些測試?只有一個節點?我想要分析你的代碼。 – k0ner
@ k0ner這些都是在我的本地機器上完成的,只有一個節點。它用於評估Cassandra,學習如何使用它並瞭解它的表現。 –
你是否試圖分析你的代碼? – k0ner