2015-08-18 96 views
0

我對ElasticSearch相當陌生,並且遇到了問題,我覺得我的搜索結果很好。我的目標是能夠根據用戶輸入的短語搜索藥物索引(6個字段)。這可能是一個或更多的話。我嘗試了一些方法,但是我會概述下面我發現的最好的方法。讓我知道我做錯了什麼。我猜測我錯過了一些基本的東西。ElasticSearch搜索越來越差的結果

這裏是領域的一個子集,我與

... 
    "hits": [ 
     { 
      "_index": "indexus2", 
      "_type": "Medication", 
      "_id": "17471", 
      "_score": 8.829264, 
      "_source": { 
       "SearchContents": " chew chewable oral po tylenol", 
       "MedShortDesc": "Tylenol PO Chew", 
       "MedLongDesc": "Tylenol Oral Chewable" 
       "GenericDesc": "ACETAMINOPHEN ORAL" 
       ... 
      } 
     } 
     ... 

我正在尋找對使用的邊緣NGRAM分析儀領域的工作。我使用的是C#巢庫索引

settings.Analysis.Tokenizers.Add("edgeNGram", new EdgeNGramTokenizer() 
      { 
       MaxGram = 50, 
       MinGram = 2, 
       TokenChars = new List<string>() { "letter", "digit" } 
      }); 

    settings.Analysis.Analyzers.Add("edgeNGramAnalyzer", new CustomAnalyzer() 
      { 
       Filter = new string[] { "lowercase" }, 
       Tokenizer = "edgeNGram" 
      }); 

我正在使用的問題對領域的more_like_this查詢

GET indexus2/Medication/_search 
{ 
    "query": { 
    "more_like_this" : { 
     "fields" : ["MedShortDesc", 
        "MedLongDesc", 
        "GenericDesc", 
        "SearchContents"], 
     "like_text" : "vicodin", 
     "min_term_freq" : 1, 
     "max_query_terms" : 25, 
     "min_word_len": 2 
    } 
    } 
} 

的問題是,這個搜索「維柯丁」,我我期望看到與全面工作的首次匹配,但我沒有。以下是這個查詢結果的一個子集。維柯丁不露面,直到第7結果

"hits": [ 
     { 
      "_index": "indexus2", 
      "_type": "Medication", 
      "_id": "31192", 
      "_score": 4.567309, 
      "_source": { 
       "SearchContents": " oral po victrelis", 
       "MedShortDesc": "Victrelis PO", 
       "MedLongDesc": "Victrelis Oral", 
       "RepresentativeRoutedGenericDesc": "BOCEPREVIR ORAL", 
       ... 
      } 
     } 
     <5 more similar results> 
     { 
      "_index": "indexus2", 
      "_type": "Medication", 
      "_id": "26198", 
      "_score": 2.2836545, 
      "_source": { 
       "SearchContents": " (original 5 500 feeding mg strength) tube via vicodin", 
       "MedShortDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube", 
       "MedLongDesc": "Vicodin 5 mg-500 mg (Original Strength) via feeding tube", 
       "GenericDesc": "HYDROCODONE BITARTRATE/ACETAMINOPHEN ORAL", 
      ... 
      } 
      } 

字段映射

"OrderableMedLongDesc": { 
     "type": "string", 
     "analyzer": "edgeNGramAnalyzer" 
}, 
"OrderableMedShortDesc": { 
     "type": "string", 
     "analyzer": "edgeNGramAnalyzer" 
}, 
"RepresentativeRoutedGenericDesc": { 
     "type": "string", 
     "analyzer": "edgeNGramAnalyzer" 
}, 
"SearchContents": { 
     "type": "string", 
     "analyzer": "edgeNGramAnalyzer" 
}, 

這裏是ES顯示我_settings分析儀的

  "analyzer": { 
      "edgeNGramAnalyzer": { 
       "type": "custom", 
       "filter": [ 
        "lowercase" 
       ], 
       "tokenizer": "edgeNGram" 
       } 
      }, 
      "tokenizer": { 
       "edgeNGram": { 
       "min_gram": "2", 
       "type": "edgeNGram", 
       "max_gram": "50" 
       } 
      } 
+0

你可以張貼映射領域 – keety

+0

@keety,我更新了帖子補充一點細節。謝謝 – Dennis

回答

1

按照上述映射edgeNGramAnalyzer是該字段的search-analyzer作爲結果搜索查詢也將得到「邊緣ngrammed」。你可能不想要這個。

更改映射以僅將index_analyzer選項設置爲edgeNgramAnalyzer

search_analyzer然後默認爲standard

例子:

"SearchContents": { 
     "type": "string", 
     "index_analyzer": "edgeNGramAnalyzer" 
}, 
+0

這看起來好多了。謝謝! – Dennis

+0

我發現原來的東西不是很正確......我能夠拉起我的初始示例設置得很好..但其他藥物應該已經以相同的方式編入索引,根本找不到。我確實發現,如果我使用multi_match而不是more_like_this,則看起來一切正常。任何想法,爲什麼會這樣? – Dennis