Elasticsearch與NGRAM索引沒有找到部分匹配

所以我有一個像這樣創建的elasticsearch指數：Elasticsearch與NGRAM索引沒有找到部分匹配

curl -XPUT 'http://localhost:9200/person' -d '{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
      "filter": { 
       "autocomplete_filter": { 
        "type":  "edge_ngram", 
        "min_gram": 1, 
        "max_gram": 20 
       } 
      }, 
      "analyzer": { 
       "autocomplete": { 
        "type":  "custom", 
        "tokenizer": "standard", 
        "filter": [ 
         "lowercase", 
         "autocomplete_filter" 
        ] 
       } 
      } 
     } 
    } 
}'

上詢問了一個名爲「伊恩」的人，我得到兩個結果

curl -XGET http://localhost:9200/person/_search -d '{ 
     "query": { 
       "match": { 
         "_all": "ian" 
       } 
     } 
}’

但在查詢只是字母ia，我應該得到儘可能多的或更多的結果，而是我沒有得到任何：

curl -XGET http://localhost:9200/person/_search -d '{ 
     "query": { 
       "match": { 
         "_all": "ia" 
       } 
     } 
}’

關於我的edge_ngram過濾器設置有什麼不對？我該如何解決這個問題？

編輯：爲了澄清，我想我的INSERT語句沿着這個

curl -XPOST "http://localhost:9200/person/RANDOM_STRING HERE/ANOTHER_RANDOM_STRING" -d "{ 
"field1" : "value", 
"field2" : "value", 
"field3" : "value" 
}"

的線條看起來插入後，我想所有字段是edge_ngram分析，這樣我可以通過局部搜索任何這些字段的字符串，並返回此結果。

來源

2015-04-23 johncorser

如果你只是想用你的分析儀，用於所有類型和所有屬性（除非另有說明），你只需要設置：使用的方式，會讓你的原始查詢工作"_all"場ncluding索引的「默認」分析器。我很難在ES文檔中找到它（它們並不總是非常用戶友好的），但這裏是一個例子。我使用ES 1.5，但我認爲不重要。

PUT /person 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
     "filter": { 
      "autocomplete_filter": { 
       "type": "edge_ngram", 
       "min_gram": 1, 
       "max_gram": 20 
      } 
     }, 
     "analyzer": { 
      "default": { 
       "type": "custom", 
       "tokenizer": "standard", 
       "filter": [ 
        "lowercase", 
        "autocomplete_filter" 
       ] 
      } 
     } 
     } 
    } 
}

然後我索引的文檔和運行您的查詢，它工作得很好：

POST /person/doc/_bulk 
{"index":{"_id":1}} 
{"name":"Ian"} 
{"index":{"_id":2}} 
{"name":"Bob Smith"} 

POST /person/_search 
{ 
    "query": { 
     "match": { 
     "_all": "ia" 
     } 
    } 
} 
... 
{ 
    "took": 1, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 1.4142135, 
     "hits": [ 
     { 
      "_index": "person", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 1.4142135, 
      "_source": { 
       "name": "Ian" 
      } 
     } 
     ] 
    } 
}

下面的代碼：

http://sense.qbox.io/gist/4e2114aafc4f3c507b4f23da8bb83f3ab00e2288

來源

2015-04-23 18:37:17

非常感謝！你是最好的 – johncorser

_all字段將使用默認分析器「標準」，除非您爲其指定一個。所以_all字段中的令牌不是edge_ngram。因此沒有搜索「ia」的結果。您通常希望避免使用_all字段進行部分匹配搜索，因爲它可能會給出意外或令人困惑的結果。

如果您仍然需要使用_all字段，則還需要指定分析儀爲「自動完成」。

來源

2015-04-23 17:40:43

你能提供一個這樣的例子？我想設置所有屬性的默認映射來索引所有東西 – johncorser

您沒有指定使用您的分析儀的任何類型。所以你定義了分析器，但沒有使用它。將文檔保存爲新類型時，將隱式定義映射，並且將使用standard analyzer，這不會創建分詞術語，因此您對「ia」的搜索不匹配任何內容。

處理此問題的一種方法是明確定義您的類型，並指定要在映射中使用的分析器。下面是一個示例，其中索引名稱爲「person」（類似於您的），類型名稱爲「doc」，並使用分析器進行索引（但不用於搜索）的屬性「name」：

PUT /person 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
      "filter": { 
       "autocomplete_filter": { 
        "type":  "edge_ngram", 
        "min_gram": 1, 
        "max_gram": 20 
       } 
      }, 
      "analyzer": { 
       "autocomplete": { 
        "type":  "custom", 
        "tokenizer": "standard", 
        "filter": [ 
         "lowercase", 
         "autocomplete_filter" 
        ] 
       } 
      } 
     } 
    }, 
    "mappings": { 
     "doc":{ 
      "properties": { 
       "name": { 
        "type": "string", 
        "index_analyzer": "autocomplete", 
        "search_analyzer": "standard" 
       } 
      } 
     } 
    } 
}

爲了測試它，我增加了幾個文檔的：

POST /person/doc/_bulk 
{"index":{"_id":1}} 
{"name":"Ian"} 
{"index":{"_id":2}} 
{"name":"Bob Smith"}

然後跑了反對的"name"場比賽查詢：

POST /person/_search 
{ 
    "query": { 
     "match": { 
     "name": "ia" 
     } 
    } 
} 
... 
{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 1, 
     "hits": [ 
     { 
      "_index": "person", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 1, 
      "_source": { 
       "name": "Ian" 
      } 
     } 
     ] 
    } 
}

下面是一些代碼我用來測試一些不同的東西，一世

http://sense.qbox.io/gist/61df5d17343651884c9422198b6a6bc00a6acb04

來源

2015-04-23 17:42:31

嗯，這個問題是它只在類型爲'doc'時才起作用，即使這樣它只能在'name'屬性上起作用。我希望能夠索引每種類型的每個屬性。 – johncorser

看到我的編輯，我給出了一個更具體的例子，說明我希望插入物看起來如何 – johncorser

哦，哈哈，那很容易。只需將分析儀的名稱更改爲「默認」即可。我會張貼另一個答案顯示。 –

Elasticsearch與NGRAM索引沒有找到部分匹配

回答

相關問題