用Elasticsearch對連續文檔進行分組

有沒有辦法讓Elasticsearch在分組時考慮序列間隙？用Elasticsearch對連續文檔進行分組

提供了以下數據批量導入到Elasticsearch：

{ "index": { "_index": "test", "_type": "groupingTest", "_id": "1" } } 
{ "sequence": 1, "type": "A" } 
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "2" } } 
{ "sequence": 2, "type": "A" } 
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "3" } } 
{ "sequence": 3, "type": "B" } 
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "4" } } 
{ "sequence": 4, "type": "A" } 
{ "index": { "_index": "test", "_type": "groupingTest", "_id": "5" } } 
{ "sequence": 5, "type": "A" }

有沒有一種方式，一種方式去查詢這個數據

序列號爲1和2轉到文件到一個輸出組，
序號爲3的文檔轉到另一個，
序號爲4和5的文檔轉到第三組？

...考慮到類型A序列被類型B項目（或任何其他不是類型A的項目）中斷的事實？

我想結果水桶是這個樣子（名稱和值sequence_group可能會有所不同 - 只是想說明的邏輯）：

"buckets": [ 
    { 
     "key": "a", 
     "sequence_group": 1, 
     "doc_count": 2 
    }, 
    { 
     "key": "b", 
     "sequence_group": 3, 
     "doc_count": 1 
    }, 
    { 
     "key": "a", 
     "sequence_group": 4, 
     "doc_count": 2 
    } 
]

有問題，一些SQL一個很好的說明在https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/解決方案。我想知道是否有彈性搜索的解決方案。

來源

2015-08-20 yaccob

你總是可以做一個術語聚合，然後應用頂級命中聚合來獲得這個。

{ 
    "aggs": { 
    "types": { 
     "terms": { 
     "field": "type" 
     }, 
     "aggs": { 
     "groups": { 
      "top_hits": { 
      "size": 10 
      } 
     } 
     } 
    } 
    } 
}

來源

2015-08-20 18:57:39

最高點擊聚合似乎不能解決問題。使用你建議的聚合檢索兩個桶 - 一個用於「A」型和一個用於「B」型。我看不出如何解決考慮序列差距的問題。 – yaccob

用Elasticsearch對連續文檔進行分組

回答

相關問題