我試圖在關鍵字類型字段上執行不區分大小寫的聚合,但我在使其工作中遇到問題。Elasticsearch 5.2.2:術語聚合不區分大小寫
我到目前爲止所嘗試的是添加一個名爲「小寫」的自定義分析器,它使用「關鍵字」標記器和「小寫」過濾器。然後,我爲要處理的字段在名爲「use_lowercase」的映射中添加了一個字段。我想保留現有的「文本」和「關鍵字」字段組件,因爲我可能想要搜索該字段中的術語。
這是該指數定義,包括自定義分析:
PUT authors
{
"settings": {
"analysis": {
"analyzer": {
"lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"famousbooks": {
"properties": {
"Author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"use_lowercase": {
"type": "text",
"analyzer": "lowercase"
}
}
}
}
}
}
}
現在我加2條記錄與同一作者,但有不同的情況:
POST authors/famousbooks/1
{
"Book": "The Mysterious Affair at Styles",
"Year": 1920,
"Price": 5.92,
"Genre": "Crime Novel",
"Author": "Agatha Christie"
}
POST authors/famousbooks/2
{
"Book": "And Then There Were None",
"Year": 1939,
"Price": 6.99,
"Genre": "Mystery Novel",
"Author": "Agatha christie"
}
到目前爲止好。現在,如果我這樣做基於作者一個方面聚集,
GET authors/famousbooks/_search
{
"size": 0,
"aggs": {
"authors-aggs": {
"terms": {
"field": "Author.use_lowercase"
}
}
}
}
我得到以下結果:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "authors",
"node": "yxcoq_eKRL2r6JGDkshjxg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
所以,在我看來,該聚集認爲搜索領域是文本代替關鍵字,因此給我fielddata警告。我認爲ES會足夠複雜,以承認術語字段實際上是關鍵字(通過自定義分析器),因此可以聚合,但似乎並非如此。
如果我將"fielddata":true
添加到Author的映射中,那麼聚合就可以正常工作,但由於在設置此值時出現高堆使用率的可怕警告,所以我很猶豫。
是否有做這種不敏感的關鍵字聚合的最佳做法?我希望我可以在映射部分中說"type":"keyword", "filter":"lowercase"
,但看起來不可用。
感覺就像我不得不使用太大的棍子來讓這個工作,如果我去"fielddata":true
路線。任何幫助,將不勝感激!
不幸的是,如果你還指定了一個分析器(如我在這裏要讓小寫字母工作),那麼指定'type:keyword'而不是'type:text'會失敗。 設置映射時的錯誤消息: [fields]的映射定義具有不支持的參數:[analyzer:lowercase] – GoodEnuf