Elasticsearch fielddata - 我應該使用它嗎？

鑑於索引中包含品牌屬性的文檔，我們需要創建一個不區分大小寫的詞彙聚合。Elasticsearch fielddata - 我應該使用它嗎？

指數定義

請注意，使用的fielddata

PUT demo_products 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "my_custom_analyzer": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": [ 
      "lowercase" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "product": { 
     "properties": { 
     "brand": { 
      "type": "text", 
      "analyzer": "my_custom_analyzer", 
      "fielddata": true, 
     } 
     } 
    } 
    } 
}

數據

POST demo_products/product 
{ 
    "brand": "New York Jets" 
} 

POST demo_products/product 
{ 
    "brand": "new york jets" 
} 

POST demo_products/product 
{ 
    "brand": "Washington Redskins" 
}

查詢

GET demo_products/product/_search 
{ 
    "size": 0, 
    "aggs": { 
    "brand_facet": { 
     "terms": { 
     "field": "brand" 
     } 
    } 
    } 
}

結果

"aggregations": { 
    "brand_facet": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
     { 
      "key": "new york jets", 
      "doc_count": 2 
     }, 
     { 
      "key": "washington redskins", 
      "doc_count": 1 
     } 
     ] 
    } 
    }

如果我們使用的keyword代替text我們結束了2桶，因爲在外殼上的差異紐約噴氣機隊。

我們關注使用fielddata會帶來的性能影響。但是，如果fielddata被禁用，我們會得到可怕的「默認情況下，Fielddata在文本字段上處於禁用狀態。」

解決此問題的任何其他提示 - 或者我們是否應該不關心fielddate？

來源

2017-01-26 Rasmus

承載ES實例（CPU，內存）的計算機有多大？我們在談論多少文件？有多少指數？ –

300.000個文檔分爲28個索引，彈性雲託管（3個服務器，目前4 GB） – Rasmus

嗯，爲什麼這麼多索引爲數不多的文檔？ –

從ES 5.2開始（今天出來），您可以使用normalizers和keyword字段以便（例如）小寫該值。

標準化器的作用有點像text字段的分析器，雖然你可以對它們做的事情更加剋制，但這可能有助於解決你面臨的問題。

你會創建索引這樣的：

PUT demo_products 
{ 
    "settings": { 
    "analysis": { 
     "normalizer": { 
     "my_normalizer": { 
      "type": "custom", 
      "filter": [ "lowercase" ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "product": { 
     "properties": { 
     "brand": { 
      "type": "keyword", 
      "normalizer": "my_normalizer" 
     } 
     } 
    } 
    } 
}

和您的查詢將返回此：

"aggregations" : { 
    "brand_facet" : { 
     "doc_count_error_upper_bound" : 0, 
     "sum_other_doc_count" : 0, 
     "buckets" : [ 
     { 
      "key" : "new york jets", 
      "doc_count" : 2 
     }, 
     { 
      "key" : "washington redskins", 
      "doc_count" : 1 
     } 
     ] 
    } 
    }

兩全其美！

來源

2017-02-01 05:12:20 Val

Elasticsearch fielddata - 我應該使用它嗎？

回答

相關問題