Elasticsearch過濾文件組通過現場

我有一些文件：Elasticsearch過濾文件組通過現場

{"name": "John", "district": 1}, 
{"name": "Mary", "district": 2}, 
{"name": "Nick", "district": 1}, 
{"name": "Bob", "district": 3}, 
{"name": "Kenny", "district": 1}

如何過濾/按地區選擇不同的文件？

{"name": "John", "district": 1}, 
{"name": "Mary", "district": 2}, 
{"name": "Bob", "district": 3}

在SQL中，我可以使用GROUP BY。我嘗試了術語聚合，但它只返回了不同的數字。

"aggs": { 
    "distinct": { 
    "terms": { 
     "field": "district", 
     "size": 0 
    } 
    } 
}

感謝您的幫助！ :-)

來源

2014-09-23 Geany

難道我的答案解決您的問題 – 2014-09-23 04:51:31

如果您ElasticSearch版本是1.3或更高版本做到這一點，你可以使用top_hits型subaggregation這會給你（默認情況下）排序在你的查詢分數上的前三個匹配文檔（在你使用match_all查詢時，這裏是1）。

您可以將size參數設置爲超過3

下面的數據集和查詢：

POST /test/districts/ 
{"name": "John", "district": 1} 

POST /test/districts/ 
{"name": "Mary", "district": 2} 

POST /test/districts/ 
{"name": "Nick", "district": 1} 

POST /test/districts/ 
{"name": "Bob", "district": 3} 

POST test/districts/_search 
{ 
    "size": 0, 
    "aggs":{ 
    "by_district":{ 
     "terms": { 
     "field": "district", 
     "size": 0 
     }, 
     "aggs": { 
     "tops": { 
      "top_hits": { 
      "size": 10 
      } 
     } 
     } 
    } 
    } 
}

將輸出文件你想要的方式：

{ 
    "took": 5, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 4, 
     "max_score": 0, 
     "hits": [] 
    }, 
    "aggregations": { 
     "by_district": { 
     "buckets": [ 
      { 
       "key": 1, 
       "key_as_string": "1", 
       "doc_count": 2, 
       "tops": { 
        "hits": { 
        "total": 2, 
        "max_score": 1, 
        "hits": [ 
         { 
          "_index": "test", 
          "_type": "districts", 
          "_id": "XYHu4I-JQcOfLm3iWjTiOg", 
          "_score": 1, 
          "_source": { 
           "name": "John", 
           "district": 1 
          } 
         }, 
         { 
          "_index": "test", 
          "_type": "districts", 
          "_id": "5dul2XMTRC2IpV_tKRRltA", 
          "_score": 1, 
          "_source": { 
           "name": "Nick", 
           "district": 1 
          } 
         } 
        ] 
        } 
       } 
      }, 
      { 
       "key": 2, 
       "key_as_string": "2", 
       "doc_count": 1, 
       "tops": { 
        "hits": { 
        "total": 1, 
        "max_score": 1, 
        "hits": [ 
         { 
          "_index": "test", 
          "_type": "districts", 
          "_id": "I-9Gd4OYSRuexhP1dCdQ-g", 
          "_score": 1, 
          "_source": { 
           "name": "Mary", 
           "district": 2 
          } 
         } 
        ] 
        } 
       } 
      }, 
      { 
       "key": 3, 
       "key_as_string": "3", 
       "doc_count": 1, 
       "tops": { 
        "hits": { 
        "total": 1, 
        "max_score": 1, 
        "hits": [ 
         { 
          "_index": "test", 
          "_type": "districts", 
          "_id": "bti2y-OUT3q2mBNhhI3xeA", 
          "_score": 1, 
          "_source": { 
           "name": "Bob", 
           "district": 3 
          } 
         } 
        ] 
        } 
       } 
      } 
     ] 
     } 
    } 
}

來源

2014-09-23 08:13:26 ThomasC

優秀的，你救了我的命！ – Geany 2014-09-23 08:42:54

嘿@ThomasC，任何想法如何也過濾這樣的聚合記錄？我已經嘗試了半個小時了。謝謝！ – lisak 2015-11-30 12:05:18

Hi @lisak！您不能在top_hits下嵌套聚合，但是，相反是可能的。嘗試使用過濾器聚合並嵌套top_hits。或者，你可以在查詢部分過濾結果 – ThomasC 2015-12-17 08:48:45

彈性搜索不會通過唯一值爲值或組提供不同的文檔。但有工作圍繞這個你可以，如果你使用的是Java客戶端或可以將其轉換成適合自己的語言

SearchResponse response = client.prepareSearch().execute().actionGet(); 
SearchHits hits = response.getHits(); 

Iterator<SearchHit> iterator = hits.iterator(); 
Map<String, SearchHit> distinctObjects = new HashMap<String,SearchHit>(); 
while (iterator.hasNext()) { 
    SearchHit searchHit = (SearchHit) iterator.next(); 
    Map<String, Object> source = searchHit.getSource(); 
    if(source.get("district") != null){ 
     distinctObjects.put(source.get("district").toString(),source); 
    } 

}

來源

2014-09-23 04:43:59

如果您使用分頁方式，該怎麼辦？獲得8個結果的頁面，其他10個和其他7個頁面，如果每頁獲得10個結果？ – 2017-12-12 15:35:37

Elasticsearch過濾文件組通過現場

回答

相關問題