2014-02-13 33 views
0

Cassandra UTF8Type的排序是什麼?Cassandra UTF8Type按鍵的順序是什麼? (Cassandra 2.0)

所有的文檔都讓我期待一個詞法排序順序(基本上是按字母排序)。這似乎不是卡桑德拉使用的命令。它是什麼使用我很難猜測。

我構建了一個表來計算影響名爲「應用程序」的交互,按照一天的時間段進行組織。 (這是一個簡單的例子來證明我的困惑的原因)。我希望能夠尋找一個特定的應用 表的CQL描述如下:

 
CREATE TABLE "appMetrics" (app text,time timestamp,counter_val counter, 
    PRIMARY KEY (app, time)) WITH COMPACT STORAGE; 

我的數據加載:

 
update "appMetrics" set counter_val = counter_val+1 WHERE app='ab' AND time='2014-02-14 00:00:00'; 
update "appMetrics" set counter_val = counter_val+1 WHERE app='a' AND time='2014-02-14 00:00:00'; 
update "appMetrics" set counter_val = counter_val+1 WHERE app='c' AND time='2014-02-14 00:00:00'; 
update "appMetrics" set counter_val = counter_val+1 WHERE app='b' AND time='2014-02-14 00:00:00'; 
update "appMetrics" set counter_val = counter_val+1 WHERE app='bc' AND time='2014-02-14 00:00:00'; 
update "appMetrics" set counter_val = counter_val+1 WHERE app='ca' AND time='2014-02-14 00:00:00'; 

我從表中選擇,看看這個結果是:

 
    select * from "appMetrics"; 

    app | time      | counter_val 
    -----+--------------------------+------------- 
     a | 2014-02-14 00:00:00-0500 |   1 
     c | 2014-02-14 00:00:00-0500 |   1 
     ab | 2014-02-14 00:00:00-0500 |   1 
     ca | 2014-02-14 00:00:00-0500 |   1 
     bc | 2014-02-14 00:00:00-0500 |   1 
     b | 2014-02-14 00:00:00-0500 |   1 

    (6 rows) 

所以,這個命令不是字母的,不是輸入順序,也不是我能看到的任何順序。順序是不是隨機的,或者至少是重複的:

cqlsh:simplex> select * from "appMetrics" where token(app) >= token('ab'); 

app | time      | counter_val 
-----+--------------------------+------------- 
    ab | 2014-02-14 00:00:00-0500 |   1 
    ca | 2014-02-14 00:00:00-0500 |   1 
    bc | 2014-02-14 00:00:00-0500 |   1 
    b | 2014-02-14 00:00:00-0500 |   1 

(4 rows) 

cqlsh:simplex> select * from "appMetrics" where token(app) <= token('ab'); 

app | time      | counter_val 
-----+--------------------------+------------- 
    a | 2014-02-14 00:00:00-0500 |   1 
    c | 2014-02-14 00:00:00-0500 |   1 
    ab | 2014-02-14 00:00:00-0500 |   1 

(3 rows) 

對於它的價值,列家族描述爲:

 
    ColumnFamily: appMetrics 
     Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
     Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType 
     Cells sorted by: org.apache.cassandra.db.marshal.TimestampType 
     GC grace seconds: 864000 
     Compaction min/max thresholds: 4/32 
     Read repair chance: 0.1 
     DC Local Read repair chance: 0.0 
     Populate IO Cache on flush: false 
     Replicate on write: true 
     Caching: KEYS_ONLY 
     Default time to live: 0 
     Bloom Filter FP chance: 0.01 
     Index interval: 128 
     Speculative Retry: 99.0PERCENTILE 
     Built indexes: [] 
     Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy 
     Compression Options: 
     sstable_compression: org.apache.cassandra.io.compress.LZ4Compressor

有人能解釋這些如何排序?

回答

0

好吧,我想我現在知道這個問題的答案。因爲密鑰(分區密鑰)是密鑰的標記化表示,所以答案是行(分區)按標記的順序存儲。

作爲示例,對於上面顯示的同一張表,我請求了密鑰的標記值,並獲得了該值。

 
cqlsh:simplex> select token(app), app from "appMetrics"; 

token(app)   | app 
----------------------+----- 
-8839064797231613815 | a 
-8198557465434950441 | c 
-7815133031266706642 | ab 
    -633243080167210587 | ca 
    4832945267908438539 | bc 
    8833996863197925870 | b 

(6 rows) 

更多信息:這是因爲我使用了默認的Murmur3Partitioner。我可以通過使用ByteOrderPartitioner按字母順序(我認爲)獲取內容。不幸的是,這是在集羣層面設置的,因此它控制着整個集羣。 Datastax不推薦使用ByteOrderPartitioner(http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html)。