2012-12-07 71 views
3

我想設置爲我的全名匹配和部分名稱匹配elasticsearch實例的映射,在字符串中搜索字符串:我有一些數據填充它無法在elasticsearch指數

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '{ 
    "mappings": { 
    "venue": { 
     "properties": { 
     "location": { 
      "type": "geo_point" 
     }, 
     "name": { 
      "fields": { 
      "name": { 
       "type": "string", 
       "analyzer": "full_name" 
      }, 
      "partial": { 
       "search_analyzer": "full_name", 
       "index_analyzer": "partial_name", 
       "type": "string" 
      } 
      }, 
      "type": "multi_field" 
     } 
     } 
    } 
    }, 
    "settings": { 
    "analysis": { 
     "filter": { 
     "swedish_snow": { 
      "type": "snowball", 
      "language": "Swedish" 
     }, 
     "name_synonyms": { 
      "type": "synonym", 
      "synonyms_path": "name_synonyms.txt" 
     }, 
     "name_ngrams": { 
      "side": "front", 
      "min_gram": 2, 
      "max_gram": 50, 
      "type": "edgeNGram" 
     } 
     }, 
     "analyzer": { 
     "full_name": { 
      "filter": [ 
      "standard", 
      "lowercase" 
      ], 
      "type": "custom", 
      "tokenizer": "standard" 
     }, 
     "partial_name": { 
      "filter": [ 
      "swedish_snow", 
      "lowercase", 
      "name_synonyms", 
      "name_ngrams", 
      "standard" 
      ], 
      "type": "custom", 
      "tokenizer": "standard" 
     } 
     } 
    } 
    } 
}' 

curl -XPOST 'http://127.0.0.1:9200/_bulk?pretty=1' -d ' 
{"index" : {"_index" : "test", "_type" : "venue"}} 
{"location" : [59.3366, 18.0315], "name" : "johnssons"} 
{"index" : {"_index" : "test", "_type" : "venue"}} 
{"location" : [59.3366, 18.0315], "name" : "johnsson"} 
{"index" : {"_index" : "test", "_type" : "venue"}} 
{"location" : [59.3366, 18.0315], "name" : "jöhnsson"} 
' 

執行某些搜索測試, 全名:

curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{ 
    "query": { 
    "bool": { 
     "should": [ 
     { 
      "text": { 
      "name": { 
       "boost": 1, 
       "query": "johnsson" 
      } 
      } 
     }, 
     { 
      "text": { 
      "name.partial": "johnsson" 
      } 
     } 
     ] 
    } 
    } 
}' 

結果:

{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 2, 
    "max_score": 0.29834434, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "venue", 
     "_id": "CAO-dDr2TFOuCM4pFfNDSw", 
     "_score": 0.29834434, 
     "_source": { 
      "location": [ 
      59.3366, 
      18.0315 
      ], 
      "name": "johnsson" 
     } 
     }, 
     { 
     "_index": "test", 
     "_type": "venue", 
     "_id": "UQWGn8L9Squ5RYDMd4jqKA", 
     "_score": 0.14663845, 
     "_source": { 
      "location": [ 
      59.3366, 
      18.0315 
      ], 
      "name": "johnssons" 
     } 
     } 
    ] 
    } 
} 

部分名稱:

curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{ 
    "query": { 
    "bool": { 
     "should": [ 
     { 
      "text": { 
      "name": { 
       "boost": 1, 
       "query": "johns" 
      } 
      } 
     }, 
     { 
      "text": { 
      "name.partial": "johns" 
      } 
     } 
     ] 
    } 
    } 
}' 

結果:

{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 2, 
    "max_score": 0.14663845, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "venue", 
     "_id": "UQWGn8L9Squ5RYDMd4jqKA", 
     "_score": 0.14663845, 
     "_source": { 
      "location": [ 
      59.3366, 
      18.0315 
      ], 
      "name": "johnssons" 
     } 
     }, 
     { 
     "_index": "test", 
     "_type": "venue", 
     "_id": "CAO-dDr2TFOuCM4pFfNDSw", 
     "_score": 0.016878016, 
     "_source": { 
      "location": [ 
      59.3366, 
      18.0315 
      ], 
      "name": "johnsson" 
     } 
     } 
    ] 
    } 
} 

內的名稱名稱:

curl -XGET 'http://127.0.0.1:9200/test/venue/_search?pretty=1' -d '{ 
    "query": { 
    "bool": { 
     "should": [ 
     { 
      "text": { 
      "ame": { 
       "boost": 1, 
       "query": "johnssons" 
      } 
      } 
     }, 
     { 
      "text": { 
      "name.partial": "johnssons" 
      } 
     } 
     ] 
    } 
    } 
}' 

結果:

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.39103588, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "venue", 
     "_id": "UQWGn8L9Squ5RYDMd4jqKA", 
     "_score": 0.39103588, 
     "_source": { 
      "location": [ 
      59.3366, 
      18.0315 
      ], 
      "name": "johnssons" 
     } 
     } 
    ] 
    } 
} 

正如你可以看到,我只獲得了一個會場後面是johnssons。我不應該得到johnssonsjohnsson嗎?我在我的設置中做錯了什麼?

回答

2

您正在使用full_name分析爲name.partial字段的搜索分析器。因此,您的查詢會翻譯到術語johnssons的查詢中,該查詢與任何內容都不匹配。

您可以使用Analyze API來查看記錄索引的方式。例如,此命令

curl -XGET 'http://127.0.0.1:9200/test/_analyze?analyzer=partial_name&pretty=1' -d 'johnssons' 

會告訴你,索引字符串「johnssons」是會被翻譯成以下條款中:「喬」,「JOH」,「約翰」,「嫖客」,「johnss」 ,「johnsso」,「johnsson」。雖然這個命令

curl -XGET 'http://127.0.0.1:9200/test/_analyze?analyzer=full_name&pretty=1' -d 'johnssons' 

會告訴你,在搜索字符串「johnssons」正在翻譯成術語「johnssons」。正如您所看到的,您的搜索字詞與您的數據在這裏沒有匹配。

+0

謝謝你的回答。所以我想得到一個更模糊的結果,人們將不得不使用partial_name作爲search_analyzer。但是這會導致搜索結果不準確......在這種情況下,我應該如何思考。也許用更高的min_gram指定一個新的front edgeNGram分析器並提升它..?你會如何做到這一點? – jakob

+0

這是您的典型精度與召回折衷。在不知道所有用例以及要支持哪些類型的「部分匹配」的情況下建議某些內容非常困難。我個人的偏好是使用將相同或類似名稱的不同變化轉換爲單一期限(如雪球的語音過濾器)的分析,如果它不使用專門的拼寫檢查搜索引擎之外可能把手拼寫錯誤。 – imotov

+0

啊哈。我可能會寫關於這個=的另一個問題)謝謝你的時間! – jakob