大問題。花了一點點努力才發現,但我設法在ES 2.0中使用新的bucket selector aggregation。
我不得不時間戳更改爲"integer"
類型得到它的工作(它將與日期以及工作,雖然)。
我創建了一個簡單的指標,並用_bulk
要求加入你的數據:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"timestamp": 0,"user":"mike","result":"failed"}
{"index":{"_id":2}}
{"timestamp": 1,"user":"anne","result":"failed"}
{"index":{"_id":3}}
{"timestamp": 2,"user":"bob","result":"success"}
{"index":{"_id":4}}
{"timestamp": 3,"user":"tom","result":"success"}
{"index":{"_id":5}}
{"timestamp": 4,"user":"jane","result":"failed"}
{"index":{"_id":6}}
{"timestamp": 5,"user":"anne","result":"success"}
{"index":{"_id":7}}
{"timestamp": 6,"user":"tom","result":"failed"}
{"index":{"_id":8}}
{"timestamp": 7,"user":"jane","result":"failed"}
{"index":{"_id":9}}
{"timestamp": 8,"user":"mike","result":"success"}
那麼我給你所要求的(我認爲)用下面的查詢什麼。下頂層"user_terms"
聚集,我可以設置三個子聚合:
"failed_filter"
選擇具有"result": "failed"
文檔,然後子聚合發現該組中的最大時間戳;
"success_filter"
選擇具有"result": "success"
的文檔,然後子聚合找到中的最大時間戳組;
- 最後,
"failed_lt_success_filter"
只選擇那些文檔針對附連到發生故障的值(最大)時間戳小於附連到成功值(最大)時間戳。
呼。
POST /test_index/_search
{
"size": 0,
"aggregations": {
"user_terms": {
"terms": {
"field": "user"
},
"aggs": {
"failed_filter": {
"filter": { "term": { "result": "failed" } },
"aggs": {
"max_timestamp": { "max": { "field": "timestamp" } }
}
},
"success_filter": {
"filter": { "term": { "result": "success" } },
"aggs": {
"max_timestamp": { "max": { "field": "timestamp" } }
}
},
"failed_lt_success_filter": {
"bucket_selector": {
"buckets_path": {
"failed_timestamp": "failed_filter.max_timestamp",
"success_timestamp": "success_filter.max_timestamp"
},
"script": "failed_timestamp < success_timestamp"
}
}
}
}
}
}
返回:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 9,
"max_score": 0,
"hits": []
},
"aggregations": {
"user_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "anne",
"doc_count": 2,
"success_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 5
}
},
"failed_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 1
}
}
},
{
"key": "mike",
"doc_count": 2,
"success_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 8
}
},
"failed_filter": {
"doc_count": 1,
"max_timestamp": {
"value": 0
}
}
}
]
}
}
}
下面是一些代碼,我以前玩的問題:
http://sense.qbox.io/gist/06083e06191445a44610f32baf1dd45c7370401e
難道可以考慮,你有一個不同的域模型,其中每個用戶有一個單獨的文檔和一個時間戳結果數組,如'{「user」:「mike」,「results」:[{「timestamp」:「t0」,「result」:「failed」}, {「timestamp」:「t8」,「result」:「success」}]}'?或者你是否絕對想爲每個事件分散文件? – Val
我根本不依賴於領域模型 - 目前的結構在我們當前的數據處理方面更容易處理,但很高興看到替代方案。您的建議結構將如何使用? – Andrew