Elasticsearch 5.2.2：術語聚合不區分大小寫

我試圖在關鍵字類型字段上執行不區分大小寫的聚合，但我在使其工作中遇到問題。Elasticsearch 5.2.2：術語聚合不區分大小寫

我到目前爲止所嘗試的是添加一個名爲「小寫」的自定義分析器，它使用「關鍵字」標記器和「小寫」過濾器。然後，我爲要處理的字段在名爲「use_lowercase」的映射中添加了一個字段。我想保留現有的「文本」和「關鍵字」字段組件，因爲我可能想要搜索該字段中的術語。

這是該指數定義，包括自定義分析：

PUT authors 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "lowercase": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": "lowercase" 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "famousbooks": { 
     "properties": { 
     "Author": { 
      "type": "text", 
      "fields": { 
      "keyword": { 
       "type": "keyword", 
       "ignore_above": 256 
      }, 
      "use_lowercase": { 
       "type": "text", 
       "analyzer": "lowercase" 
      } 
      } 
     } 
     } 
    } 
    } 
}

現在我加2條記錄與同一作者，但有不同的情況：

POST authors/famousbooks/1 
{ 
    "Book": "The Mysterious Affair at Styles", 
    "Year": 1920, 
    "Price": 5.92, 
    "Genre": "Crime Novel", 
    "Author": "Agatha Christie" 
} 

POST authors/famousbooks/2 
{ 
    "Book": "And Then There Were None", 
    "Year": 1939, 
    "Price": 6.99, 
    "Genre": "Mystery Novel", 
    "Author": "Agatha christie" 
}

到目前爲止好。現在，如果我這樣做基於作者一個方面聚集，

GET authors/famousbooks/_search 
{ 
    "size": 0, 
    "aggs": { 
    "authors-aggs": { 
     "terms": { 
     "field": "Author.use_lowercase" 
     } 
    } 
    } 
}

我得到以下結果：

{ 
    "error": { 
    "root_cause": [ 
     { 
     "type": "illegal_argument_exception", 
     "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
     } 
    ], 
    "type": "search_phase_execution_exception", 
    "reason": "all shards failed", 
    "phase": "query", 
    "grouped": true, 
    "failed_shards": [ 
     { 
     "shard": 0, 
     "index": "authors", 
     "node": "yxcoq_eKRL2r6JGDkshjxg", 
     "reason": { 
      "type": "illegal_argument_exception", 
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
     } 
     } 
    ], 
    "caused_by": { 
     "type": "illegal_argument_exception", 
     "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
    } 
    }, 
    "status": 400 
}

所以，在我看來，該聚集認爲搜索領域是文本代替關鍵字，因此給我fielddata警告。我認爲ES會足夠複雜，以承認術語字段實際上是關鍵字（通過自定義分析器），因此可以聚合，但似乎並非如此。

如果我將"fielddata":true添加到Author的映射中，那麼聚合就可以正常工作，但由於在設置此值時出現高堆使用率的可怕警告，所以我很猶豫。

是否有做這種不敏感的關鍵字聚合的最佳做法？我希望我可以在映射部分中說"type":"keyword", "filter":"lowercase"，但看起來不可用。

感覺就像我不得不使用太大的棍子來讓這個工作，如果我去"fielddata":true路線。任何幫助，將不勝感激！

來源

2017-02-28 GoodEnuf

但你也定義use_lowercase文本：

"use_lowercase": { "type": "text", "analyzer": "lowercase" }

嘗試將其定義爲type: keyword - 它幫助我有一個類似的問題，我與排序。

來源

2017-02-28 22:27:02 paqash

不幸的是，如果你還指定了一個分析器（如我在這裏要讓小寫字母工作），那麼指定'type：keyword'而不是'type：text'會失敗。設置映射時的錯誤消息： [fields]的映射定義具有不支持的參數：[analyzer：lowercase] – GoodEnuf

原來的解決方法是使用自定義標準化程序而不是自定義分析程序。

PUT authors 
{ 
    "settings": { 
    "analysis": { 
     "normalizer": { 
     "myLowercase": { 
      "type": "custom", 
      "filter": [ "lowercase" ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "famousbooks": { 
     "properties": { 
     "Author": { 
      "type": "text", 
      "fields": { 
      "keyword": { 
       "type": "keyword", 
       "ignore_above": 256 
      }, 
      "use_lowercase": { 
       "type": "keyword", 
       "normalizer": "myLowercase", 
       "ignore_above": 256 
      } 
      } 
     } 
     } 
    } 
    } 
}

然後這允許術語聚合使用字段Author.use_lowercase沒有問題。

來源

2017-03-01 23:28:07 GoodEnuf

Elasticsearch 5.2.2：術語聚合不區分大小寫

回答

相關問題