2017-04-27 220 views
1

我想知道是否有辦法做類似於bucket_selector的事情,但是基於關鍵匹配而不是數字度量進行測試。Elasticsearch彙總聚合

爲了讓更多的背景下,這是我的使用情況:

數據樣本:

[ 
    { 
    "@version": "1", 
    "@timestamp": "2017-04-27T04:28:23.589Z", 
    "type": "json", 
    "headers": { 
     "message": { 
     "type": "requestactivation" 
     } 
    }, 
    "id": "668" 
    }, 
    { 
    "@version": "1", 
    "@timestamp": "2017-04-27T04:32:23.589Z", 
    "type": "json", 
    "headers": { 
     "message": { 
     "type": "requestactivation" 
     } 
    }, 
    "id": "669" 
    }, 
    { 
    "@version": "1", 
    "@timestamp": "2017-04-27T04:30:00.802Z", 
    "type": "json", 
    "headers": { 
     "message": { 
     "type": "activationrequested" 
     } 
    }, 
    "id": "668" 
    } 
] 

我想檢索所有的IDS在最後一個事件是requestactivation類型。

我已經有檢索每個ID, 最後的事件類型的集合,但我還沒有想出如何篩選基礎上,重點

這桶是查詢:

{ 
    "size": 0, 
    "query": { 
    "bool": { 
     "filter": [ 
     { 
      "exists": { 
      "field": "id" 
      } 
     }, 
     { 
      "terms": { 
      "headers.message.type": [ 
       "requestactivation", 
       "activationrequested" 
      ] 
      } 
     } 
     ] 
    } 
    }, 
    "aggs": { 
    "id": { 
     "terms": { 
     "field": "id", 
     "size": 10000 
     }, 
     "aggs": { 
     "latest": { 
      "max": { 
      "field": "@timestamp" 
      } 
     }, 
     "hmtype": { 
      "terms": { 
      "field": "headers.message.type", 
      "size": 1 
      } 
     } 
     } 
    } 
    } 
} 

下面是結果樣品:

{ 
    "took": 5, 
    "timed_out": false, 
    "_shards": { 
    "total": 3, 
    "successful": 3, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 3, 
    "max_score": 0, 
    "hits": [] 
    }, 
    "aggregations": { 
    "id": { 
     "doc_count_error_upper_bound": 3, 
     "sum_other_doc_count": 46, 
     "buckets": [ 
     { 
      "key": "986", 
      "doc_count": 4, 
      "hmtype": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 2, 
      "buckets": [ 
       { 
       "key": "activationrequested", 
       "doc_count": 2 
       } 
      ] 
      }, 
      "latest": { 
      "value": 1493238253603, 
      "value_as_string": "2017-04-26T20:24:13.603Z" 
      } 
     }, 
     { 
      "key": "967", 
      "doc_count": 2, 
      "hmtype": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 1, 
      "buckets": [ 
       { 
       "key": "requestactivation", 
       "doc_count": 1 
       } 
      ] 
      }, 
      "latest": { 
      "value": 1493191161242, 
      "value_as_string": "2017-04-26T07:19:21.242Z" 
      } 
     }, 
     { 
      "key": "554", 
      "doc_count": 7, 
      "hmtype": { 
      "doc_count_error_upper_bound": 0, 
      "sum_other_doc_count": 5, 
      "buckets": [ 
       { 
       "key": "requestactivation", 
       "doc_count": 5 
       } 
      ] 
      }, 
      "latest": { 
      "value": 1493200196871, 
      "value_as_string": "2017-04-26T09:49:56.871Z" 
      } 
     } 
     ] 
    } 
    } 
} 

所有映射不分析(關鍵字)。

目標是將結果減少到只有桶中的關鍵字爲「requestactivation」的結果。

無法使用文檔計數,因爲activationrequest可能會多次出現在id中。

最近纔開始鑽研聚合,所以如果問題看起來很明顯,那麼道歉,周圍的例子似乎不符合這個特定的邏輯。

回答

1

如何在terms聚集用於include包括在術語「過濾器」的值,只有相關的請求:

{ 
    "size": 0, 
    "query": { 
    "bool": { 
     "filter": [ 
     { 
      "exists": { 
      "field": "id" 
      } 
     }, 
     { 
      "terms": { 
      "headers.message.type": [ 
       "requestactivation", 
       "activationrequested" 
      ] 
      } 
     } 
     ] 
    } 
    }, 
    "aggs": { 
    "id": { 
     "terms": { 
     "field": "id", 
     "size": 10000 
     }, 
     "aggs": { 
     "latest": { 
      "max": { 
      "field": "@timestamp" 
      } 
     }, 
     "hmtype": { 
      "filter": { 
      "terms": { 
       "headers.message.type": [ 
       "requestactivation", 
       "activationrequested" 
       ] 
      } 
      }, 
      "aggs": { 
      "count_types": { 
       "cardinality": { 
       "field": "headers.message.type" 
       } 
      } 
      } 
     }, 
     "filter_buckets": { 
      "bucket_selector": { 
      "buckets_path": { 
       "totalTypes":"hmtype > count_types" 
      }, 
      "script": "params.totalTypes == 2" 
      } 
     } 
     } 
    } 
    } 
} 
+0

我可能失去了一些東西,但在測試了所提出的包括我結束與所有具有「activationrequested」事件的id(從您的示例中,我實際上正在尋找「requestactivation」),這是否id具有其他類型的事件。 – Olivier

+0

我的不好,應該是「include」:「requestactivation」......但我覺得在路上有一些限制。 –

+0

但包含基本上行爲相同的方式,如果我已經過濾了查詢中的激活請求的**事件**(因爲我不關心每個說的查詢命中)。而我想過濾掉** ids **,其中收到了激活請求。 – Olivier