Elasticsearch - 清潔但沒有分析儀

我正在研究一個問題，這需要我們在Elasticsearch中進行精確的單詞匹配。例如，如果要搜索「布賴頓碼頭」一詞，它應該在搜索「布賴頓碼頭」時匹配，而不是在「布賴頓碼頭」和「碼頭」上搜索。Elasticsearch - 清潔但沒有分析儀

我已經制定出如何簡單地通過將要搜索的字段轉到not_analyzed。

然而，當不分析它意味着停用詞，套管等會影響結果。

那麼有沒有辦法不分析，但仍然清潔？當然，您可以在添加索引之前進行清理，並使用搜索詞本身，但這很乏味！

2015-06-03 redrubia

我想你可以得到你想要的keyword tokenizer和lowercase filter。

我給你一個簡單的例子。我設置了這樣一個指數，使用自定義分析：

PUT /test_index 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
      "analyzer": { 
       "my_analyzer": { 
        "type": "custom", 
        "tokenizer": "keyword", 
        "filter": ["lowercase"] 
       } 
      } 
     } 
    }, 
    "mappings": { 
     "doc": { 
      "properties": { 
       "text_field": { 
        "type": "string", 
        "analyzer": "my_analyzer" 
       } 
      } 
     } 
    } 
}

然後，我添加了幾個文件：

POST /test_index/doc/_bulk 
{"index":{"_id":1}} 
{"text_field":"Brighton Pier"} 
{"index":{"_id":2}} 
{"text_field":"West Pier"}

這有助於看一看由分析儀產生的條款：

POST /test_index/_search?search_type=count 
{ 
    "aggs": { 
     "text_field_terms": { 
     "terms": { 
      "field": "text_field" 
     } 
     } 
    } 
} 
... 
{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0, 
     "hits": [] 
    }, 
    "aggregations": { 
     "text_field_terms": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "brighton pier", 
       "doc_count": 1 
      }, 
      { 
       "key": "west pier", 
       "doc_count": 1 
      } 
     ] 
     } 
    } 
}

由於自儀既被用於索引和搜索（因爲我沒有具體說明他們separately），只要我使用match query，任以下兩個查詢將工作：

POST /test_index/_search 
{ 
    "query": { 
     "match": { 
     "text_field": "Brighton Pier" 
     } 
    } 
} 
... 
{ 
    "took": 1, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 1, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 1, 
      "_source": { 
       "text_field": "Brighton Pier" 
      } 
     } 
     ] 
    } 
} 

POST /test_index/_search 
{ 
    "query": { 
     "term": { 
     "text_field": "brighton pier" 
     } 
    } 
} 
... 
{ 
    "took": 1, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 1, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 1, 
      "_source": { 
       "text_field": "Brighton Pier" 
      } 
     } 
     ] 
    } 
}

但是，如果我用一個term query（或過濾器），只有小寫版本將返回一個結果。

下面是一些代碼，我曾經玩它：

http://sense.qbox.io/gist/d13a463af383c6fc5ad00d86bc27947c0016cf8f

來源

2015-06-03 16:50:49

Elasticsearch - 清潔但沒有分析儀

回答

相關問題