如何通過字段值限制ElasticSearch結果？

我們已經有了一個使用映射器附件插件在ElasticSearch中索引恢復文檔的系統。除索引文件外，我還會存儲一些基本信息，例如與申請人或員工相關的信息，姓名以及他們在系統中分配的ID。在運行可能是這個樣子，當它擊中的查詢ES：如何通過字段值限制ElasticSearch結果？

{ 
    "size" : 100, 
    "query" : { 
    "query_string" : { 
     "query" : "software AND (developer OR engineer)", 
     "default_field" : "fileData" 
    } 
    }, 
    "_source" : { 
    "includes" : [ "applicant.*", "employee.*" ] 
    } 
}

，並得到我的結果，如：

"hits": [100] 
    0: { 
     "_index": "careers" 
     "_type": "resume" 
     "_id": "AVEW8FJcqKzY6y-HB4tr" 
     "_score": 0.4530588 
     "_source": { 
     "applicant": { 
     "name": "John Doe" 
     "id": 338338 
     } 
     } 
    }...

我想要做的是限制的結果，因此，如果約翰身份證號碼爲338338的Doe在系統中有三個不同的簡歷，都與查詢匹配，我只收回一場比賽，最好是得分最高的一場比賽（儘管這並不重要，只要我能找到這個人）。我一直在嘗試使用過濾器和聚合的不同選項，但我還沒有偶然發現過這樣做的方法。

我有很多方法可以在應用程序中調用ES來解決此問題，但如果我可以在ES端執行此操作，那就更好了。由於我限制查詢的結果爲100個，我想要取回100個個人，而不是回收100個結果，然後發現其中25％是與同一個人綁定的文檔。

來源

2016-02-19 ckasek

'applicant.id'是唯一的是嗎？你的問題與這一個有類似的意圖：http://stackoverflow.com/questions/35490641/elasticsearch-filter-the-maximum-value-document/35492605#35492605 – IanGabes

你想要做的是獲得前100個唯一記錄的聚合，然後是一個要求「top_hits」的子聚合。這裏是我的系統的一個例子。在我的例子我：

設定結果的大小爲0，因爲我只關心聚合
針對每個聚集聚集的大小設置爲100
，獲得最高1結果

GET index1/type1/_search { "size": 0, "aggs": { "a1": { "terms": { "field": "input.user.name", "size": 100 }, "aggs": { "topHits": { "top_hits": { "size": 1 } } } } } }

來源

2016-02-19 21:28:24 jhilden

使用上面的答案，從IanGabes的鏈接，我能夠調整自己的搜索像這樣：

{ 
    "size": 0, 
    "query": { 
     "query_string": { 
      "query": "software AND (developer OR engineer)", 
      "default_field": "fileData" 
     } 
    }, 
    "aggregations": { 
     "employee": { 
      "terms": { 
       "field": "employee.id", 
       "size": 100 
      }, 
      "aggregations": { 
       "score": { 
        "max": { 
         "script": "scores" 
        } 
       } 
      } 
     }, 
     "applicant": { 
      "terms": { 
       "field": "applicant.id", 
       "size": 100 
      }, 
      "aggregations": { 
       "score": { 
        "max": { 
         "script": "scores" 
        } 
       } 
      } 
     } 
    } 
}

這讓我回到兩個桶，其中一個包含所有申請人ID和匹配文檔中的最高分，以及員工的相同。該腳本不過是包含「_score」作爲內容的碎片上的常規腳本。

來源

2016-02-22 19:10:52 ckasek

如何通過字段值限制ElasticSearch結果？

回答

相關問題