部分匹配和完全匹配的多個字段的彈性搜索

我們的Account模型具有first_name,last_name和ssn（社會安全號碼）。部分匹配和完全匹配的多個字段的彈性搜索

我想在first_name,姓氏上做部分匹配，但在ssn上完全匹配。我有這個至今：

settings analysis: { 
    filter: { 
     substring: { 
     type: "nGram", 
     min_gram: 3, 
     max_gram: 50 
     }, 
     ssn_string: { 
     type: "nGram", 
     min_gram: 9, 
     max_gram: 9 
     }, 
    }, 
    analyzer: { 
     index_ngram_analyzer: { 
     type: "custom", 
     tokenizer: "standard", 
     filter: ["lowercase", "substring"] 
     }, 
     search_ngram_analyzer: { 
     type: "custom", 
     tokenizer: "standard", 
     filter: ["lowercase", "substring"] 
     }, 
     ssn_ngram_analyzer: { 
     type: "custom", 
     tokenizer: "standard", 
     filter: ["ssn_string"] 
     }, 
    } 
    } 

    mapping do 
    [:first_name, :last_name].each do |attribute| 
     indexes attribute, type: 'string', 
         index_analyzer: 'index_ngram_analyzer', 
         search_analyzer: 'search_ngram_analyzer' 
    end 

    indexes :ssn, type: 'string', index: 'not_analyzed' 

    end

我的搜索如下：

query: { 
    multi_match: { 
    fields: ["first_name", "last_name", "ssn"], 
    query: query, 
    type: "cross_fields", 
    operator: "and" 
    }

}

所以此工程：

Account.search("erik").records.to_a

，甚至（對埃裏克·史密斯）：

Account.search("erik smi").records.to_a

和SSN：

Account.search("111112222").records.to_a

但不是：

Account.search("erik 111112222").records.to_a

如果我索引或查詢不對任何想法？

謝謝你的幫助！

來源

2015-04-22 axiom_chicago

是否必須使用單個查詢字符串完成操作？如果沒有，我會這樣做：

PUT /test_index 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
     "filter": { 
      "ngram_filter": { 
       "type": "ngram", 
       "min_gram": 2, 
       "max_gram": 20 
      } 
     }, 
     "analyzer": { 
      "ngram_analyzer": { 
       "type": "custom", 
       "tokenizer": "standard", 
       "filter": [ 
        "lowercase", 
        "ngram_filter" 
       ] 
      } 
     } 
     } 
    }, 
    "mappings": { 
     "doc": { 
     "_all": { 
      "enabled": true, 
      "index_analyzer": "ngram_analyzer", 
      "search_analyzer": "standard" 
     }, 
     "properties": { 
      "first_name": { 
       "type": "string", 
       "include_in_all": true 
      }, 
      "last_name": { 
       "type": "string", 
       "include_in_all": true 
      }, 
      "ssn": { 
       "type": "string", 
       "index": "not_analyzed", 
       "include_in_all": false 
      } 
     } 
     } 
    } 
}

請注意使用_all field。我包括first_name和last_name在_all，但不包括ssn，並且ssn根本不分析，因爲我想完全匹配它。

我索引用於說明一對夫婦的文件：

POST /test_index/doc/_bulk 
{"index":{"_id":1}} 
{"first_name":"Erik","last_name":"Smith","ssn":"111112222"} 
{"index":{"_id":2}} 
{"first_name":"Bob","last_name":"Jones","ssn":"123456789"}

然後我就可以查詢部分名稱和過濾器通過精確的SSN：

POST /test_index/doc/_search 
{ 
    "query": { 
     "filtered": { 
     "query": { 
      "match": { 
       "_all": { 
        "query": "eri smi", 
        "operator": "and" 
       } 
      } 
     }, 
     "filter": { 
      "term": { 
       "ssn": "111112222" 
      } 
     } 
     } 
    } 
}

我的東西拿回來我m期待：

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 0.8838835, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 0.8838835, 
      "_source": { 
       "first_name": "Erik", 
       "last_name": "Smith", 
       "ssn": "111112222" 
      } 
     } 
     ] 
    } 
}

如果您需要能夠使用單個查詢字符串（沒有篩選r），您也可以在all字段中包含ssn，但通過此設置，它也將匹配部分字符串（如111112），以便可能不是您想要的。

如果您只想匹配前綴（即從單詞開頭處開始的搜索項），則應使用edge ngrams。

我寫了一篇博客文章使用的n-gram這可能會幫助你一點：http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

下面是我用這個答案的代碼。我嘗試了一些不同的東西，包括我在這裏發佈的設置，以及另一個包含ssn,_all，但帶有邊緣ngrams。希望這有助於：

http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f

來源

2015-04-22 20:20:40

非常感謝你的回覆 - 哇。我會試試看。我認爲這也起到了作用：一旦我拿出search_analyzer：search_ngram_analyzer，它就起作用了（到目前爲止，我已經完成了測試）。 –

部分匹配和完全匹配的多個字段的彈性搜索

回答

相關問題