0
我是卡桑德拉的新手。我收集系統狀態,每5分鐘,所以我創造了這個表,Cassandra查詢多行最新數據
create table sysportal (hostname text, logged_date text, logged_time timestamp, service text, plugin_output text, status text, PRIMARY KEY((hostname, logged_date), logged_time, service, plugin_output, status));
樣品表,
hostname | logged_date | logged_time | service | plugin_output | status
--------------------------------------------------------------------------------
host1 | 2014-02-21 | 2014-02-21 07:25:30+0000 | disk | DISK OK | ok
host2 | 2014-02-21 | 2014-02-21 07:25:31+0000 | disk | DISK Warning | ok
host1 | 2014-02-22 | 2014-02-22 15:23:50+0000 | disk | DISK OK | ok
host2 | 2014-02-22 | 2014-02-22 15:23:50+0000 | disk | DISK Warning | ok
host1 | 2014-02-23 | 2014-02-23 15:23:50+0000 | load | LOAD OK | ok
host2 | 2014-02-23 | 2014-02-23 15:23:50+0000 | ping | PING OK | ok
我如何在單個查詢所有主機的最新數據?
使用Python目前我做這個,
select logged_date, logged_time from sysportal limit 1; => In python save in variables
select hostname from sysportal; => In python get distinct hosts
然後,
for i in hosts:
select service from sysportal where hostname=i and logged_date=va1 and logged_time=var2
有人可以諮詢我是否可以在卡桑德拉單個查詢做到這一點? 我應該創建其他表/ column_families嗎?
我不知道我完全理解它。你能解釋一下使用logged_date(PK)|創建新表的原因主機名幫助?所以這些表將是現有的sysportal host1 | logged_date1 | logged_time1 | ....和另一個表logged_date1 | host1?我如何映射它們? – karthik
更新的答案是更清晰一點 –
好吧所以「」select * sysportal_by_date where logged_date = order by logged_time desc;「」我每隔5分鐘就會說200個主機..說1小時的數據將有2400行..所以當你說「logged_date = 」它將採取所有2400行,然後按記錄時間排序,將主機1:00,主機2:00,主機3:00 ....主機1:05,主機2:05,主機3 :05 ... host1:10,host2:10,host3:10 ..等等。你能建議我怎樣才能得到每個主機的最新數據? –
karthik