Elasticsearc - NGRAM過濾器保存/保持原有令牌

我申請的NGRAM過濾器，以我的串場：Elasticsearc - NGRAM過濾器保存/保持原有令牌

"custom_ngram": { 
    "type": "ngram", 
    "min_gram": 3, 
    "max_gram": 10 
}

但作爲一個結果，我失去令牌比NGRAM範圍更短或更長。

例如找不到「iq」或「a4」等原始令牌。

我已經在ngram之前應用了一些特定於語言的分析，所以我想避免複製整個字段。我正在尋找用ngram擴展令牌。

任何想法或ngram-suggestions？

這裏是我的分析儀，它使用custom_ngram濾波器的一個示例：

"french": { 
    "type":"custom", 
    "tokenizer": "standard", 
    "filter": [ 
     "french_elision", 
     "lowercase", 
     "french_stop", 
     "custom_ascii_folding", 
     "french_stemmer", 
     "custom_ngram" 
    ] 
}

來源

2016-07-12 Philipp

我不認爲我明白是什麼問題。 –

例如，由於ngram過濾器而搜索「a4」時，字符串「駕駛奧迪a4」將不匹配。然而，「driv」，「drivi」......將匹配。我需要有兩個。 – Philipp

你沒有選擇，而不是使用多字段和索引字段有不同的分析儀，它能夠保持短條款也是如此。類似的東西：

"text": { 
     "type": "string", 
     "analyzer": "french", 
     "fields": { 
     "standard_version": { 
      "type": "string", 
      "analyzer": "standard" 
     } 
     } 
    }

，並調整查詢也觸及text.standard_version領域也是如此。

來源

2016-07-12 13:50:26

由於安德烈斯特凡指出，我不得不與multi_fields去。

我沒有和我的映射（法國），現在看起來是這樣的：

   "french_strings": { 
        "match": "*_fr", 
        "match_mapping_type": "string", 
        "mapping": { 
         "type": "string", 
         "analyzer": "french", 
         "fields":{ 
          "ngram":{ 
           "type":"string", 
           "index":"analyzed", 
           "analyzer":"ngram", 
           "search_analyzer": "default_search" 
          } 
         } 
        } 
       }

我決定從法國分析儀取出NGRAM過濾器，並使用「自定義NGRAM只」分析儀的子.ngram。這導致法國分析字段和「原始到ngram」子字段。

來源

2016-07-12 14:16:44 Philipp

Elasticsearc - NGRAM過濾器保存/保持原有令牌

回答

相關問題