2017-02-28 51 views
3

我試圖在關鍵字類型字段上執行不區分大小寫的聚合,但我在使其工作中遇到問題。Elasticsearch 5.2.2:術語聚合不區分大小寫

我到目前爲止所嘗試的是添加一個名爲「小寫」的自定義分析器,它使用「關鍵字」標記器和「小寫」過濾器。然後,我爲要處理的字段在名爲「use_lowercase」的映射中添加了一個字段。我想保留現有的「文本」和「關鍵字」字段組件,因爲我可能想要搜索該字段中的術語。

這是該指數定義,包括自定義分析:

PUT authors 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "lowercase": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": "lowercase" 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "famousbooks": { 
     "properties": { 
     "Author": { 
      "type": "text", 
      "fields": { 
      "keyword": { 
       "type": "keyword", 
       "ignore_above": 256 
      }, 
      "use_lowercase": { 
       "type": "text", 
       "analyzer": "lowercase" 
      } 
      } 
     } 
     } 
    } 
    } 
} 

現在我加2條記錄與同一作者,但有不同的情況:

POST authors/famousbooks/1 
{ 
    "Book": "The Mysterious Affair at Styles", 
    "Year": 1920, 
    "Price": 5.92, 
    "Genre": "Crime Novel", 
    "Author": "Agatha Christie" 
} 

POST authors/famousbooks/2 
{ 
    "Book": "And Then There Were None", 
    "Year": 1939, 
    "Price": 6.99, 
    "Genre": "Mystery Novel", 
    "Author": "Agatha christie" 
} 

到目前爲止好。現在,如果我這樣做基於作者一個方面聚集,

GET authors/famousbooks/_search 
{ 
    "size": 0, 
    "aggs": { 
    "authors-aggs": { 
     "terms": { 
     "field": "Author.use_lowercase" 
     } 
    } 
    } 
} 

我得到以下結果:

{ 
    "error": { 
    "root_cause": [ 
     { 
     "type": "illegal_argument_exception", 
     "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
     } 
    ], 
    "type": "search_phase_execution_exception", 
    "reason": "all shards failed", 
    "phase": "query", 
    "grouped": true, 
    "failed_shards": [ 
     { 
     "shard": 0, 
     "index": "authors", 
     "node": "yxcoq_eKRL2r6JGDkshjxg", 
     "reason": { 
      "type": "illegal_argument_exception", 
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
     } 
     } 
    ], 
    "caused_by": { 
     "type": "illegal_argument_exception", 
     "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
    } 
    }, 
    "status": 400 
} 

所以,在我看來,該聚集認爲搜索領域是文本代替關鍵字,因此給我fielddata警告。我認爲ES會足夠複雜,以承認術語字段實際上是關鍵字(通過自定義分析器),因此可以聚合,但似乎並非如此。

如果我將"fielddata":true添加到Author的映射中,那麼聚合就可以正常工作,但由於在設置此值時出現高堆使用率的可怕警告,所以我很猶豫。

是否有做這種不敏感的關鍵字聚合的最佳做法?我希望我可以在映射部分中說"type":"keyword", "filter":"lowercase",但看起來不可用。

感覺就像我不得不使用太大的棍子來讓這個工作,如果我去"fielddata":true路線。任何幫助,將不勝感激!

回答

0

但你也定義use_lowercase文本:

"use_lowercase": { "type": "text", "analyzer": "lowercase" }

嘗試將其定義爲type: keyword - 它幫助我有一個類似的問題,我與排序。

+0

不幸的是,如果你還指定了一個分析器(如我在這裏要讓小寫字母工作),那麼指定'type:keyword'而不是'type:text'會失敗。 設置映射時的錯誤消息: [fields]的映射定義具有不支持的參數:[analyzer:lowercase] – GoodEnuf

2

原來的解決方法是使用自定義標準化程序而不是自定義分析程序。

PUT authors 
{ 
    "settings": { 
    "analysis": { 
     "normalizer": { 
     "myLowercase": { 
      "type": "custom", 
      "filter": [ "lowercase" ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "famousbooks": { 
     "properties": { 
     "Author": { 
      "type": "text", 
      "fields": { 
      "keyword": { 
       "type": "keyword", 
       "ignore_above": 256 
      }, 
      "use_lowercase": { 
       "type": "keyword", 
       "normalizer": "myLowercase", 
       "ignore_above": 256 
      } 
      } 
     } 
     } 
    } 
    } 
} 

然後這允許術語聚合使用字段Author.use_lowercase沒有問題。