0
我使用Elasticsearch 2.4,添加了icu_analysis插件以提供對日文文本的排序。它適用於我的本地環境,其中有文件數量有限,不夠好,但是當我嘗試它放在一個更真實的數據集,查詢失敗,出現以下CircuitBreakingException:導致CircuitBreakingException使用icu_collation日文文本的嵌套排序
"CircuitBreakingException[[fielddata] Data too large, data for [translations.name.jp_sort] would be larger than limit of [10239895142/9.5gb]]"
據我所知,這個嘗試時,會發生對大量文檔計數的字段數據進行排序,應該使用文檔值 - 但我不確定在這種情況下是否可以完成這項工作,或者爲什麼尚未發生。
索引中有大約4.7億個文檔,它們將翻譯存儲爲嵌套文檔 - 全集中只有約3500萬包含日文翻譯。下面是文件的映射:
{
"settings" : {
"number_of_shards" : 6,
"number_of_replicas": 0,
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
},
"japanese_ordering": {
"type": "icu_collation",
"language": "ja",
"country": "JP"
}
},
"analyzer": {
"trigrams": {
"tokenizer": "my_ngram_tokenizer",
"filter": "lowercase"
},
"japanese_ordering": {
"tokenizer": "keyword",
"filter": [ "japanese_ordering" ]
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "3",
"max_gram": "3",
"token_chars": [
"letter",
"digit",
"symbol",
"punctuation"
]
}
}
}
},
"mappings" : {
"product" : {
"_all" : {
"enabled" : false
},
"properties" : {
"name" : {
"type" : "string",
"analyzer": "trigrams",
"fields": {
"value" : {
"type": "string",
"index": "not_analyzed"
}
}
},
"record_status" : {
"type" : "integer"
},
"categories" : {
"type" : "integer"
},
"variant_status" : {
"type" : "integer"
},
"visit_count" : {
"type" : "integer"
},
"translations": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"fields": {
"jp_sort": {
"type": "string",
"analyzer": "japanese_ordering"
}
}
},
"language_id": {
"type": "short"
}
}
}
}
}
}
}
,這是CircuitBreaking查詢:
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [],
"must_not": [],
"must": [{
"nested": {
"path": "translations",
"score_mode": "max",
"query": {
"bool": {
"must": [{
"match": {
"translations.name": {
"query": "\u30C6\u30B9\u30C8",
"boost": 5
}
}
}]
}
}
}
}]
}
},
"filter": {
"bool": {
"must": [{
"terms": {
"variant_status": ["1"],
"_cache": true
}
}, {
"nested": {
"path": "translations",
"query": {
"bool": {
"must": [{
"term": {
"translations.language_id": 9,
"_cache": true
}
}]
}
}
}
}, {
"term": {
"record_status": 1,
"_cache": true
}
}],
"must_not": [{
"term": {
"product_collections": 0
}
}]
}
},
"sort": [{
"translations.name.jp_sort": {
"order": "asc",
"nested_path": "translations"
}
}]
}
的ES 5.5版本已經推出了名爲'icu_collation_keyword'新的字段類型解決了您所遇到的問題。你可以在這裏閱讀更多信息:https://www.elastic.co/blog/elasticsearch-5-5-0-released – Val
實際上,這確實解決了它 - 我花了幾個小時更新我的查詢和索引器的版本更改,並且然後嘗試了icu_collation_keyword。它運作良好,而且速度非常快!如果您想提交您的評論作爲答案,我會將其標記爲已接受。謝謝! –