2015-06-03 28 views
0

我正在研究一個問題,這需要我們在Elasticsearch中進行精確的單詞匹配。例如,如果要搜索「布賴頓碼頭」一詞,它應該在搜索「布賴頓碼頭」時匹配,而不是在「布賴頓碼頭」和「碼頭」上搜索。Elasticsearch - 清潔但沒有分析儀

我已經制定出如何簡單地通過將要搜索的字段轉到not_analyzed

然而,當不分析它意味着停用詞,套管等會影響結果。

那麼有沒有辦法不分析,但仍然清潔?當然,您可以在添加索引之前進行清理,並使用搜索詞本身,但這很乏味!

回答

1

我想你可以得到你想要的keyword tokenizerlowercase filter

我給你一個簡單的例子。我設置了這樣一個指數,使用自定義分析:

PUT /test_index 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
      "analyzer": { 
       "my_analyzer": { 
        "type": "custom", 
        "tokenizer": "keyword", 
        "filter": ["lowercase"] 
       } 
      } 
     } 
    }, 
    "mappings": { 
     "doc": { 
      "properties": { 
       "text_field": { 
        "type": "string", 
        "analyzer": "my_analyzer" 
       } 
      } 
     } 
    } 
} 

然後,我添加了幾個文件:

POST /test_index/doc/_bulk 
{"index":{"_id":1}} 
{"text_field":"Brighton Pier"} 
{"index":{"_id":2}} 
{"text_field":"West Pier"} 

這有助於看一看由分析儀產生的條款:

POST /test_index/_search?search_type=count 
{ 
    "aggs": { 
     "text_field_terms": { 
     "terms": { 
      "field": "text_field" 
     } 
     } 
    } 
} 
... 
{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0, 
     "hits": [] 
    }, 
    "aggregations": { 
     "text_field_terms": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "brighton pier", 
       "doc_count": 1 
      }, 
      { 
       "key": "west pier", 
       "doc_count": 1 
      } 
     ] 
     } 
    } 
} 

由於自儀既被用於索引和搜索(因爲我沒有具體說明他們separately),只要我使用match query,任以下兩個查詢將工作:

POST /test_index/_search 
{ 
    "query": { 
     "match": { 
     "text_field": "Brighton Pier" 
     } 
    } 
} 
... 
{ 
    "took": 1, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 1, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 1, 
      "_source": { 
       "text_field": "Brighton Pier" 
      } 
     } 
     ] 
    } 
} 

POST /test_index/_search 
{ 
    "query": { 
     "term": { 
     "text_field": "brighton pier" 
     } 
    } 
} 
... 
{ 
    "took": 1, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 1, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 1, 
      "_source": { 
       "text_field": "Brighton Pier" 
      } 
     } 
     ] 
    } 
} 

但是,如果我用一個term query(或過濾器),只有小寫版本將返回一個結果。

下面是一些代碼,我曾經玩它:

http://sense.qbox.io/gist/d13a463af383c6fc5ad00d86bc27947c0016cf8f