2016-06-30 55 views
1

我試圖在查詢時使用同義詞分析器,而沒有獲得預期的結果。有人可以對此有所瞭解嗎?某些多字詞同義詞在嵌套字段的elasticsearch中不起作用

這裏是我的索引映射:

{ 
    "jobs_user_profile_v2": { 
    "mappings": { 
     "profile": { 
     "_all": { 
      "enabled": false 
     }, 
     "_ttl": { 
      "enabled": true 
     }, 
     "properties": { 

      "rsa": { 
      "type": "nested", 
      "properties": { 
       "answer": { 
       "type": "string", 
       "index_analyzer": "autocomplete", 
       "search_analyzer": "synonym", 
       "position_offset_gap": 100 
       }, 
       "answerId": { 
       "type": "long" 
       }, 
       "answerOriginal": { 
       "type": "string", 
       "index": "not_analyzed" 
       }, 
       "createdAt": { 
       "type": "long" 
       }, 
       "label": { 
       "type": "string", 
       "index": "not_analyzed" 
       }, 
       "labelOriginal": { 
       "type": "string", 
       "index": "not_analyzed" 
       }, 
       "question": { 
       "type": "string", 
       "index": "not_analyzed" 
       }, 
       "questionId": { 
       "type": "long" 
       }, 
       "questionOriginal": { 
       "type": "string" 
       }, 
       "source": { 
       "type": "integer" 
       }, 
       "updatedAt": { 
       "type": "long" 
       } 
      } 
      } 

     } 
     } 
    } 
    } 
} 

重點領域是rsa.answer,這是我查詢的領域。

我的代名詞映射:

Beautician,Stylist,Make up artist,Massage therapist,Therapist,Spa,Hair Dresser,Salon,Beauty Parlour,Parlor => Beautician 
Carpenter,Wood Worker,Furniture Carpenter => Carpenter 
Cashier,Store Manager,Store Incharge,Purchase Executive,Billing Executive,Billing Boy => Cashier 
Content Writer,Writer,Translator,Writing,Copywriter,Content Creation,Script Writer,Freelance Writer,Freelance Content Writer => Content Writer 

我的搜索查詢:

http://{{domain}}/jobs_user_profile_v2/_search 

{ 
    "query": { 
     "nested":{ 
      "path": "rsa", 
      "query":{ 
    "query_string": { 
     "query": "hair dresser", 
     "fields": ["answer"], 
     "analyzer" :"synonym" 



    } 
    }, 
    "inner_hits": { 
      "explain": true 
     } 

    } 
    }, 
    "explain" : true, 
    "sort" : [ { 
    "_score" : { } 
    } ] 
} 

它顯示正確Beautician和「出納profiles for search query美髮and計費執行but not showing anything for木材工人=>木匠`案例。

我的分析結果:

http://{{domain}}/jobs_user_profile_v2/_analyze?analyzer=synonym&text=hair dresser 


{ 
    "tokens": [ 
    { 
     "token": "beautician", 
     "start_offset": 0, 
     "end_offset": 12, 
     "type": "SYNONYM", 
     "position": 1 
    } 
    ] 
} 

wood worker case

http://{{domain}}/jobs_user_profile_v2/_analyze?analyzer=synonym&text=wood worker 


{ 
    "tokens": [ 
    { 
     "token": "carpenter", 
     "start_offset": 0, 
     "end_offset": 11, 
     "type": "SYNONYM", 
     "position": 1 
    } 
    ] 
} 

它也不能正常工作了幾個其他案件。

我分析器設置指數:

"analysis": { 
      "filter": { 
      "synonym": { 
       "ignore_case": "true", 
       "type": "synonym", 
       "synonyms_path": "synonym.txt" 
      }, 
      "autocomplete_filter": { 
       "type": "edge_ngram", 
       "min_gram": "3", 
       "max_gram": "10" 
      } 
      }, 
      "analyzer": { 
      "text_en_splitting_search": { 
       "type": "custom", 
       "filter": [ 
       "stop", 
       "lowercase", 
       "porter_stem", 
       "word_delimiter" 
       ], 
       "tokenizer": "whitespace" 
      }, 
      "synonym": { 
       "filter": [ 
       "stop", 
       "lowercase", 
       "synonym" 
       ], 
       "type": "custom", 
       "tokenizer": "standard" 
      }, 
      "autocomplete": { 
       "filter": [ 
       "lowercase", 
       "autocomplete_filter" 
       ], 
       "type": "custom", 
       "tokenizer": "standard" 
      }, 
      "text_en_splitting": { 
       "filter": [ 
       "lowercase", 
       "porter_stem", 
       "word_delimiter" 
       ], 
       "type": "custom", 
       "tokenizer": "whitespace" 
      }, 
      "text_general": { 
       "filter": [ 
       "lowercase" 
       ], 
       "type": "custom", 
       "tokenizer": "standard" 
      }, 
      "edge_ngram_analyzer": { 
       "filter": [ 
       "lowercase" 
       ], 
       "type": "custom", 
       "tokenizer": "edge_ngram_tokenizer" 
      }, 
      "autocomplete_analyzer": { 
       "filter": [ 
       "lowercase" 
       ], 
       "tokenizer": "whitespace" 
      } 
      }, 
      "tokenizer": { 
      "edge_ngram_tokenizer": { 
       "token_chars": [ 
       "letter", 
       "digit" 
       ], 
       "min_gram": "2", 
       "type": "edgeNGram", 
       "max_gram": "10" 
      } 
      } 
     } 

回答

0

對於上述情況下,一個multi-match比查詢字符串更加理想。 不同於查詢字符串的多重匹配在分析之前不會標記查詢字詞。因此,多詞同義詞可能無法按預期工作。

例子:

{ 
    "query": { 
     "nested": { 
     "path": "rsa", 
     "query": { 
      "multi_match": { 
       "query": "wood worker", 
       "fields": [ 
        "rsa.answer" 
       ], 
       "type" : "cross_fields", 
       "analyzer": "synonym" 
      } 
     } 
     } 
    } 
} 

如果由於某種原因,你喜歡的查詢字符串,那麼你就需要通過在雙引號整個查詢,以確保它不會標記化:

例如:

post test/_search 
{ 
    "query": { 
     "nested": { 
     "path": "rsa", 
     "query": { 
      "query_string": { 
       "query": "\"wood worker\"", 
       "fields": [ 
        "rsa.answer" 
       ], 
       "analyzer": "synonym" 
      } 
     } 
     } 
    } 
} 
+0

謝謝@keety,這很有幫助。 –

+0

如果我們已經將**路徑**作爲** rsa **給出,是否有必要在**字段中給** rsa.answer **? –