2013-08-23 49 views
13

的synonyms_path我是很新,elasticsearch,我想使用同義詞,我添加了配置文件中的這些行:如何配置elasticsearch

index : 
    analysis : 
     analyzer : 
      synonym : 
       type : custom 
       tokenizer : whitespace 
       filter : [synonym] 
     filter : 
      synonym : 
       type : synonym 
       synonyms_path: synonyms.txt 

然後我創建的索引測試:

"mappings" : { 
    "test" : { 
    "properties" : { 
     "text_1" : { 
      "type" : "string", 
      "analyzer" : "synonym" 
     }, 
     "text_2" : { 
      "search_analyzer" : "standard", 
      "index_analyzer" : "synonym", 
      "type" : "string" 
     }, 
     "text_3" : { 
      "type" : "string", 
      "analyzer" : "synonym" 
     } 
    } 
    } 

}

和insrted與此數據類型測試:

{ 
"text_3" : "foo dog cat", 
"text_2" : "foo dog cat", 
"text_1" : "foo dog cat" 
} 

synonyms.txt包含「富,酒吧,巴茲」,當我搜索富返回我的預期,但是當我搜索巴茲或禁止其返回零分的結果:

{ 
"query":{ 
"query_string":{ 
    "query" : "bar", 
    "fields" : [ "text_1"], 
    "use_dis_max" : true, 
    "boost" : 1.0 
}}} 

結果:

{ 
"took":1, 
"timed_out":false, 
"_shards":{ 
"total":5, 
"successful":5, 
"failed":0 
}, 
"hits":{ 
"total":0, 
"max_score":null, 
"hits":[ 
] 
} 
} 

回答

17

我不知道,如果你的問題是因爲你定義的「酒吧」的同義詞不好。正如你所說,你很新,我會舉一個類似於你的例子。我想要展示elasticsearch在搜索時和索引時如何處理同義詞。希望能幫助到你。

第一件事創建同義詞文件:

foo => foo bar, baz 

現在我創建您要測試的特定設置索引:

curl -XPUT 'http://localhost:9200/test/' -d '{ 
    "settings": { 
    "index": { 
     "analysis": { 
     "analyzer": { 
      "synonym": { 
      "tokenizer": "whitespace", 
      "filter": ["synonym"] 
      } 
     }, 
     "filter" : { 
      "synonym" : { 
       "type" : "synonym", 
       "synonyms_path" : "synonyms.txt" 
      } 
     } 
     } 
    } 
    }, 
    "mappings": { 

    "test" : { 
     "properties" : { 
     "text_1" : { 
      "type" : "string", 
      "analyzer" : "synonym" 
     }, 
     "text_2" : { 
      "search_analyzer" : "standard", 
      "index_analyzer" : "standard", 
      "type" : "string" 
     }, 
     "text_3" : { 
      "type" : "string", 
      "search_analyzer" : "synonym", 
      "index_analyzer" : "standard" 
     } 
     } 
    } 
    } 
}' 

注意synonyms.txt必須在同一目錄,因爲該路徑以後的配置文件是相對於config目錄的。

現在指數DOC:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{ 
    "text_3": "baz dog cat", 
    "text_2": "foo dog cat", 
    "text_1": "foo dog cat" 
}' 

現在搜索

在現場文字_1搜索

curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz' 
{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.15342641, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "test", 
     "_id": "1", 
     "_score": 0.15342641, 
     "_source": { 
      "text_3": "baz dog cat", 
      "text_2": "foo dog cat", 
      "text_1": "foo dog cat" 
     } 
     } 
    ] 
    } 
} 

你得到的文件,因爲巴茲是在索引時間的foo和同義詞foo用其同義詞擴展

搜索現場_2

curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz' 

結果:

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 0, 
    "max_score": null, 
    "hits": [] 
    } 
} 

因爲我而索引(標準分析)不擴大同義詞我不明白命中。而且,由於我正在搜索baz,而baz不在文本中,所以我沒有得到任何結果。

搜索現場text_3

curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo' 
{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
    }, 
    "hits": { 
    "total": 1, 
    "max_score": 0.15342641, 
    "hits": [ 
     { 
     "_index": "test", 
     "_type": "test", 
     "_id": "1", 
     "_score": 0.15342641, 
     "_source": { 
      "text_3": "baz dog cat", 
      "text_2": "foo dog cat", 
      "text_1": "foo dog cat" 
     } 
     } 
    ] 
    } 
} 

注:text_3是 「巴茲狗貓」

text_3是沒有擴大的同義詞索引。當我搜索foo時,其中有「baz」作爲其中一個同義詞,我得到結果。

如果你想調試你可以使用_analyze端點,例如:

curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true' 

結果:

{ 
    "tokens": [ 
    { 
     "token": "foo", 
     "start_offset": 0, 
     "end_offset": 3, 
     "type": "SYNONYM", 
     "position": 1 
    }, 
    { 
     "token": "baz", 
     "start_offset": 0, 
     "end_offset": 3, 
     "type": "SYNONYM", 
     "position": 1 
    }, 
    { 
     "token": "bar", 
     "start_offset": 0, 
     "end_offset": 3, 
     "type": "SYNONYM", 
     "position": 2 
    } 
    ] 
}