2015-04-22 123 views
1

我們的Account模型具有first_name,last_namessn(社會安全號碼)。部分匹配和完全匹配的多個字段的彈性搜索

我想在first_name,姓氏上做部分匹配,但在ssn上完全匹配。我有這個至今:

settings analysis: { 
    filter: { 
     substring: { 
     type: "nGram", 
     min_gram: 3, 
     max_gram: 50 
     }, 
     ssn_string: { 
     type: "nGram", 
     min_gram: 9, 
     max_gram: 9 
     }, 
    }, 
    analyzer: { 
     index_ngram_analyzer: { 
     type: "custom", 
     tokenizer: "standard", 
     filter: ["lowercase", "substring"] 
     }, 
     search_ngram_analyzer: { 
     type: "custom", 
     tokenizer: "standard", 
     filter: ["lowercase", "substring"] 
     }, 
     ssn_ngram_analyzer: { 
     type: "custom", 
     tokenizer: "standard", 
     filter: ["ssn_string"] 
     }, 
    } 
    } 

    mapping do 
    [:first_name, :last_name].each do |attribute| 
     indexes attribute, type: 'string', 
         index_analyzer: 'index_ngram_analyzer', 
         search_analyzer: 'search_ngram_analyzer' 
    end 

    indexes :ssn, type: 'string', index: 'not_analyzed' 

    end 

我的搜索如下:

query: { 
    multi_match: { 
    fields: ["first_name", "last_name", "ssn"], 
    query: query, 
    type: "cross_fields", 
    operator: "and" 
    } 

}

所以此工程:

Account.search("erik").records.to_a 

,甚至(對埃裏克·史密斯):

Account.search("erik smi").records.to_a 

和SSN:

Account.search("111112222").records.to_a 

但不是:

Account.search("erik 111112222").records.to_a 

如果我索引或查詢不對任何想法?

謝謝你的幫助!

回答

1

是否必須使用單個查詢字符串完成操作?如果沒有,我會這樣做:

PUT /test_index 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
     "filter": { 
      "ngram_filter": { 
       "type": "ngram", 
       "min_gram": 2, 
       "max_gram": 20 
      } 
     }, 
     "analyzer": { 
      "ngram_analyzer": { 
       "type": "custom", 
       "tokenizer": "standard", 
       "filter": [ 
        "lowercase", 
        "ngram_filter" 
       ] 
      } 
     } 
     } 
    }, 
    "mappings": { 
     "doc": { 
     "_all": { 
      "enabled": true, 
      "index_analyzer": "ngram_analyzer", 
      "search_analyzer": "standard" 
     }, 
     "properties": { 
      "first_name": { 
       "type": "string", 
       "include_in_all": true 
      }, 
      "last_name": { 
       "type": "string", 
       "include_in_all": true 
      }, 
      "ssn": { 
       "type": "string", 
       "index": "not_analyzed", 
       "include_in_all": false 
      } 
     } 
     } 
    } 
} 

請注意使用_all field。我包括first_namelast_name_all,但不包括ssn,並且ssn根本不分析,因爲我想完全匹配它。

我索引用於說明一對夫婦的文件:

POST /test_index/doc/_bulk 
{"index":{"_id":1}} 
{"first_name":"Erik","last_name":"Smith","ssn":"111112222"} 
{"index":{"_id":2}} 
{"first_name":"Bob","last_name":"Jones","ssn":"123456789"} 

然後我就可以查詢部分名稱和過濾器通過精確的SSN:

POST /test_index/doc/_search 
{ 
    "query": { 
     "filtered": { 
     "query": { 
      "match": { 
       "_all": { 
        "query": "eri smi", 
        "operator": "and" 
       } 
      } 
     }, 
     "filter": { 
      "term": { 
       "ssn": "111112222" 
      } 
     } 
     } 
    } 
} 

我的東西拿回來我m期待:

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 0.8838835, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 0.8838835, 
      "_source": { 
       "first_name": "Erik", 
       "last_name": "Smith", 
       "ssn": "111112222" 
      } 
     } 
     ] 
    } 
} 

如果您需要能夠使用單個查詢字符串(沒有篩選r),您也可以在all字段中包含ssn,但通過此設置,它也將匹配部分字符串(如111112),以便可能不是您想要的。

如果您只想匹配前綴(即從單詞開頭處開始的搜索項),則應使用edge ngrams

我寫了一篇博客文章使用的n-gram這可能會幫助你一點:http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

下面是我用這個答案的代碼。我嘗試了一些不同的東西,包括我在這裏發佈的設置,以及另一個包含ssn,_all,但帶有邊緣ngrams。希望這有助於:

http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f

+0

非常感謝你的回覆 - 哇。我會試試看。我認爲這也起到了作用:一旦我拿出search_analyzer:search_ngram_analyzer,它就起作用了(到目前爲止,我已經完成了測試)。 –