導致CircuitBreakingException使用icu_collation日文文本的嵌套排序

我使用Elasticsearch 2.4，添加了icu_analysis插件以提供對日文文本的排序。它適用於我的本地環境，其中有文件數量有限，不夠好，但是當我嘗試它放在一個更真實的數據集，查詢失敗，出現以下CircuitBreakingException：導致CircuitBreakingException使用icu_collation日文文本的嵌套排序

"CircuitBreakingException[[fielddata] Data too large, data for [translations.name.jp_sort] would be larger than limit of [10239895142/9.5gb]]"

據我所知，這個嘗試時，會發生對大量文檔計數的字段數據進行排序，應該使用文檔值 - 但我不確定在這種情況下是否可以完成這項工作，或者爲什麼尚未發生。

索引中有大約4.7億個文檔，它們將翻譯存儲爲嵌套文檔 - 全集中只有約3500萬包含日文翻譯。下面是文件的映射：

{ 
    "settings" : { 
    "number_of_shards" : 6, 
    "number_of_replicas": 0, 
    "analysis": { 
     "filter": { 
      "trigrams_filter": { 
       "type":  "ngram", 
       "min_gram": 3, 
       "max_gram": 3 
      }, 
      "japanese_ordering": { 
      "type":  "icu_collation", 
      "language": "ja", 
      "country": "JP" 
      } 
     }, 
     "analyzer": { 
     "trigrams": { 
      "tokenizer": "my_ngram_tokenizer", 
      "filter": "lowercase" 
     }, 
     "japanese_ordering": { 
      "tokenizer": "keyword", 
      "filter": [ "japanese_ordering" ] 
     } 
     }, 
     "tokenizer": { 
     "my_ngram_tokenizer": { 
      "type": "nGram", 
      "min_gram": "3", 
      "max_gram": "3", 
      "token_chars": [ 
      "letter", 
      "digit", 
      "symbol", 
      "punctuation" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings" : { 
    "product" : { 
     "_all" : { 
     "enabled" : false 
     }, 
     "properties" : { 
     "name" : { 
      "type" : "string", 
      "analyzer": "trigrams", 
      "fields": { 
      "value" : { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "record_status" : { 
      "type" : "integer" 
     }, 
     "categories" : { 
      "type" : "integer" 
     }, 
     "variant_status" : { 
      "type" : "integer" 
     }, 
     "visit_count" : { 
      "type" : "integer" 
     }, 
     "translations": { 
      "type": "nested", 
      "properties": { 
      "name": { 
       "type": "string", 
       "fields": { 
       "jp_sort": { 
        "type":  "string", 
        "analyzer": "japanese_ordering" 
       } 
       } 
      }, 
      "language_id": { 
       "type": "short" 
      } 
      } 
     } 
     } 
    } 
    } 
}

，這是CircuitBreaking查詢：

{ 
    "from": 0, 
    "size": 20, 
    "query": { 
     "bool": { 
      "should": [], 
      "must_not": [], 
      "must": [{ 
       "nested": { 
        "path": "translations", 
        "score_mode": "max", 
        "query": { 
         "bool": { 
          "must": [{ 
           "match": { 
            "translations.name": { 
             "query": "\u30C6\u30B9\u30C8", 
             "boost": 5 
            } 
           } 
          }] 
         } 
        } 
       } 
      }] 
     } 
    }, 
    "filter": { 
     "bool": { 
      "must": [{ 
       "terms": { 
        "variant_status": ["1"], 
        "_cache": true 
       } 
      }, { 
       "nested": { 
        "path": "translations", 
        "query": { 
         "bool": { 
          "must": [{ 
           "term": { 
            "translations.language_id": 9, 
            "_cache": true 
           } 
          }] 
         } 
        } 
       } 
      }, { 
       "term": { 
        "record_status": 1, 
        "_cache": true 
       } 
      }], 
      "must_not": [{ 
       "term": { 
        "product_collections": 0 
       } 
      }] 
     } 
    }, 
    "sort": [{ 
     "translations.name.jp_sort": { 
      "order": "asc", 
      "nested_path": "translations" 
     } 
    }] 
}

來源

2017-07-10 Chris Barcroft

的ES 5.5版本已經推出了名爲'icu_collation_keyword'新的字段類型解決了您所遇到的問題。你可以在這裏閱讀更多信息：https://www.elastic.co/blog/elasticsearch-5-5-0-released – Val

實際上，這確實解決了它 - 我花了幾個小時更新我的查詢和索引器的版本更改，並且然後嘗試了icu_collation_keyword。它運作良好，而且速度非常快！如果您想提交您的評論作爲答案，我會將其標記爲已接受。謝謝！ –

的ES 5.5版本已推出名爲icu_collation_keyword新的字段類型解決了你的問題面對。

你可以在這裏閱讀更多：https://www.elastic.co/blog/elasticsearch-5-5-0-released

來源

2017-07-11 15:59:10 Val

導致CircuitBreakingException使用icu_collat​​ion日文文本的嵌套排序

回答

相關問題

導致CircuitBreakingException使用icu_collation日文文本的嵌套排序