什麼是使用ElasticSearch搜索全球各地名稱的有效方法？

我有位置信息提供GeoNames.org解析到關係數據庫。使用這些信息，我試圖構建一個ElasticSearch索引，其中包含人口稠密的地方（城市）名稱，行政區劃（州，省等）名稱，國家名稱和國家代碼。我的目標是提供一個位置搜索，類似於谷歌地圖：什麼是使用ElasticSearch搜索全球各地名稱的有效方法？

Google Maps

我不需要清涼大膽突出，但我所需要的搜索類似的方式返回了類似的結果。我嘗試創建一個包含整個位置名稱（例如，「Round Rock，TX，United States」）的單個位置字段的映射，並且我還嘗試了由每個位置組成的五個單獨的字段。我試過關鍵字和前綴查詢和edgengram分析器;我一直沒有找到正確的配置來正確工作。

什麼樣的分析儀 - 包括索引和搜索 - 我應該通過什麼樣的方式來實現我的目標？此搜索不必像谷歌那樣完善，但我希望它至少與谷歌相似。

我確實想支持部分名稱匹配，這就是爲什麼我一直在擺弄edgengram。例如，搜索「round r」應該與美國德克薩斯州的Round Rock相匹配。另外，我寧願那些人口稠密的地方（城市）名稱以精確搜索詞開頭的結果排名高於其他結果。例如，搜索「round ro」應該與美國德克薩斯州的Round Rock，Round，Some省，RO（羅馬尼亞）相匹配。我希望我已經明確了這一點。

這是我目前的指數配置（這是在C＃中的匿名類型稍後序列化JSON並傳遞到ElasticSearch API）：

settings = new 
{ 
    index = new 
    { 
     number_of_shards = 1, 
     number_of_replicas = 0, 
     refresh_interval = -1, 
     analysis = new 
     { 
      analyzer = new 
      { 
       edgengram_index_analyzer = new 
       { 
        type = "custom", 
        tokenizer = "index_tokenizer", 
        filter = new[] { "lowercase", "asciifolding" }, 
        char_filter = new[] { "no_commas_char_filter" }, 
        stopwords = new object[0] 
       }, 
       search_analyzer = new 
       { 
        type = "custom", 
        tokenizer = "standard", 
        filter = new[] { "lowercase", "asciifolding" }, 
        char_filter = new[] { "no_commas_char_filter" }, 
        stopwords = new object[0] 
       } 
      }, 
      tokenizer = new 
      { 
       index_tokenizer = new 
       { 
        type = "edgeNGram", 
        min_gram = 1, 
        max_gram = 100 
       } 
      }, 
      char_filter = new 
      { 
       no_commas_char_filter = new 
       { 
        type = "mapping", 
        mappings = new[] { ",=>" } 
       } 
      } 
     } 
    } 
}, 
mappings = new 
{ 
    location = new 
    { 
     _all = new { enabled = false }, 
     properties = new 
     { 
      populatedPlace = new { index_analyzer = "edgengram_index_analyzer", type = "string" }, 
      administrativeDivision = new { index_analyzer = "edgengram_index_analyzer", type = "string" }, 
      administrativeDivisionAbbreviation = new { index_analyzer = "edgengram_index_analyzer", type = "string" }, 
      country = new { index_analyzer = "edgengram_index_analyzer", type = "string" }, 
      countryCode = new { index_analyzer = "edgengram_index_analyzer", type = "string" }, 
      population = new { type = "long" } 
     } 
    } 
}

來源

2013-11-21 NathanAldenSr

karmi on #elasticsearch IRC建議我應該看看ElasticSearch中的實驗性「建議者」功能。建議者似乎比我的需求前綴查詢或edgengrams更好。 – NathanAldenSr

這可能是你在找什麼：

"analysis": { 
    "tokenizer": { 
     "name_tokenizer": { 
     "type": "edgeNGram", 
     "max_gram": 100, 
     "min_gram": 2, 
     "side": "front" 
     } 
    }, 
    "analyzer": { 
     "name_analyzer": { 
     "tokenizer": "whitespace", 
     "type": "custom", 
     "filter": ["lowercase", "multi_words", "name_filter"] 
     }, 
    }, 
    "filter": { 
     "multi_words": { 
     "type": "shingle", 
     "min_shingle_size": 2, 
     "max_shingle_size": 10 
     }, 
     "name_filter": { 
     "type": "edgeNGram", 
     "max_gram": 100, 
     "min_gram": 2, 
     "side": "front" 
     },   
    } 
    }

我認爲使用name_analyzer會複製您正在討論的谷歌搜索。您可以稍微調整一下配置以適應您的需求。

來源

2013-11-21 17:39:52

謝謝，我一定會將其與建議實施進行比較。順便說一句，'side'在最新的ES版本中已棄用。 – NathanAldenSr

我最終與完成建議者一起去。我並不確定我是否正確使用它，但很容易讓這種搜索工作。 – NathanAldenSr

好的，是的，'建議者'似乎是適合的。解決方案給你。在'elasticsearch'中沒有添加'建議者'時，我提供的是非常古老的。正如你正確指出'side'已被棄用。現在，'edgeNgram'默認將'side'作爲'front' –

什麼是使用ElasticSearch搜索全球各地名稱的有效方法？

回答

相關問題