使用python驅動程序計數Cassandra列家族中的'rows'

如何更有效地使用python驅動程序來計算Cassandra列族中的「行數」？我使用下面的代碼：使用python驅動程序計數Cassandra列家族中的'rows'

from cassandra.cluster import Cluster 
from sys import stdout 

servers = ['server1', 'server2'] 
cluster = Cluster(servers) 
session = cluster.connect() 

result = session.execute('select * from ks1.t1') 

count = 0 

for i in result: 
    count += 1 

print count

來源

2016-04-13 Dimaf

我不知道什麼樣的對象'result'信息，但嘗試'數= LEN（結果） '。 – armatita

爲什麼你不能只用'select count（*）from ks1.t1'？ – mikea

可怕的方法來計數行。基本上你在做一個全表掃描。

要統計分佈式系統中的精確行數很困難。

您可以使用nodetool tablestats/cfstats

如果您確實需要有一個的分區數量的估計（分區==行，如果你沒有在表列簇）一個精確的行數，使用一個位於同一地點的Spark安裝來獲取本地Spark內存中的所有數據，然後使用Spark對它們進行計數。這樣計數將被分配，而不會壓倒協調員。

示例Scala代碼：

import com.datastax.spark.connector._ 

sc.cassandraTable("keyspace", "table_name").count()

來源

2016-04-13 18:20:16 doanduyhai

是的，這很可怕，很長，但我需要知道確切的金額。 – Dimaf

布賴恩·赫斯有一個獨立的 '卡桑德拉數'。

簡單的程序來計算卡桑德拉表中的記錄數。通過使用numSplits參數分割令牌範圍，可以減少每個查詢計算的數量，並減少超時的概率。

Spark的確很適合這種操作，但該程序的目標是成爲一個簡單的實用程序，不需要 Spark。

https://github.com/brianmhess/cassandra-count

來源

2016-04-13 21:39:07 Bradski

謝謝。我見過這個工具，我想用Python找到一個解決方案。 – Dimaf

Python中實現這一點，爲什麼不執行以下操作：

from cassandra.cluster import Cluster 

servers = ['server1', 'server2'] 
cluster = Cluster(servers) 
session = cluster.connect() 

result = session.execute('select count(*) from ks1.t1') 

count = 0 
for row in result: # will only be 1 row 
    count += row.count 

print(count)

來源

2017-10-16 15:55:42 Serenthia

使用python驅動程序計數Cassandra列家族中的'rows'

回答

相關問題