2015-12-08 76 views
0

我在ES中使用建議API完成。我的實現工作(代碼如下),但我想在查詢中搜索多個單詞。在下面的例子中,如果我查詢搜索「word」,它會找到「wordpress」並輸出「Found」。我試圖完成的是用「詞博客雜誌」這樣的東西來查詢所有標籤,並且具有「找到」的輸出。任何幫助,將不勝感激!ElasticSearch:建議完成多搜索

映射:

curl -XPUT "http://localhost:9200/test_index/" -d' 
    { 
    "mappings": { 
     "product": { 
     "properties": { 
      "description": { 
       "type": "string" 
      }, 
      "tags": { 
       "type": "string" 
      }, 
      "title": { 
       "type": "string" 
      }, 
      "tag_suggest": { 
       "type": "completion", 
       "index_analyzer": "simple", 
       "search_analyzer": "simple", 
       "payloads": false 
      } 
     } 
     } 
    } 
}' 

添加文檔:

curl -XPUT "http://localhost:9200/test_index/product/1" -d' 
    { 
    "title": "Product1", 
    "description": "Product1 Description", 
    "tags": [ 
     "blog", 
     "magazine", 
     "responsive", 
     "two columns", 
     "wordpress" 
    ], 
    "tag_suggest": { 
     "input": [ 
     "blog", 
     "magazine", 
     "responsive", 
     "two columns", 
     "wordpress" 
     ], 
     "output": "Found" 
    } 
}' 

_suggest查詢:

curl -XPOST "http://localhost:9200/test_index/_suggest" -d' 
    { 
    "product_suggest":{ 
     "text":"word", 
     "completion": { 
      "field" : "tag_suggest" 
     } 
    } 
}' 
The results are as we would expect: 
    { 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "product_suggest": [ 
     { 
     "text": "word", 
     "offset": 0, 
     "length": 4, 
     "options": [ 
      { 
      "text": "Found", 
      "score": 1 
     }, 
     ] 
     } 
    ] 
} 
+0

您是否願意使用ngram解決方案而不是完成建議? –

+0

我實際上以前有模糊實現的邊緣語法,但是我的分數都搞砸了,並建議使用建議api來更快地查詢大量數據。這兩者之間你有什麼看法?對我來說一個關鍵的要求是用空格分隔多個搜索 – emarel

+0

使用ngram解決方案最後一部分很容易。雖然不確定評分。我不確定是否要完成多項任務。我得看看它。我假設你想要一個OR搜索,而不是,對嗎? –

回答

0

如果你願意改用edge ngrams(或完整的n-gram,如果您需要他們),我認爲它會解決你的問題。

我寫了如何做到這一點,在這個博客後一個相當詳細的解釋:

https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

但我會在這裏給你一個快速和骯髒的版本。訣竅是將ngram與_all fieldmatch AND operator一起使用。

所以用這個映射:

PUT /test_index 
{ 
    "settings": { 
     "analysis": { 
     "filter": { 
      "ngram_filter": { 
       "type": "edge_ngram", 
       "min_gram": 2, 
       "max_gram": 20 
      } 
     }, 
     "analyzer": { 
      "ngram_analyzer": { 
       "type": "custom", 
       "tokenizer": "standard", 
       "filter": [ 
        "lowercase", 
        "ngram_filter" 
       ] 
      } 
     } 
     } 
    }, 
    "mappings": { 
     "doc": { 
     "_all": { 
      "type": "string", 
      "analyzer": "ngram_analyzer", 
      "search_analyzer": "standard" 
     }, 
     "properties": { 
      "word": { 
       "type": "string", 
       "include_in_all": true 
      }, 
      "definition": { 
       "type": "string", 
       "include_in_all": true 
      } 
     } 
     } 
    } 
} 

和一些文件:

PUT /test_index/_bulk 
{"index":{"_index":"test_index","_type":"doc","_id":1}} 
{"word":"democracy", "definition":"government by the people; a form of government in which the supreme power is vested in the people and exercised directly by them or by their elected agents under a free electoral system."} 
{"index":{"_index":"test_index","_type":"doc","_id":2}} 
{"word":"republic", "definition":"a state in which the supreme power rests in the body of citizens entitled to vote and is exercised by representatives chosen directly or indirectly by them."} 
{"index":{"_index":"test_index","_type":"doc","_id":3}} 
{"word":"oligarchy", "definition":"a form of government in which all power is vested in a few persons or in a dominant class or clique; government by the few."} 
{"index":{"_index":"test_index","_type":"doc","_id":4}} 
{"word":"plutocracy", "definition":"the rule or power of wealth or of the wealthy."} 
{"index":{"_index":"test_index","_type":"doc","_id":5}} 
{"word":"theocracy", "definition":"a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities."} 
{"index":{"_index":"test_index","_type":"doc","_id":6}} 
{"word":"monarchy", "definition":"a state or nation in which the supreme power is actually or nominally lodged in a monarch."} 
{"index":{"_index":"test_index","_type":"doc","_id":7}} 
{"word":"capitalism", "definition":"an economic system in which investment in and ownership of the means of production, distribution, and exchange of wealth is made and maintained chiefly by private individuals or corporations, especially as contrasted to cooperatively or state-owned means of wealth."} 
{"index":{"_index":"test_index","_type":"doc","_id":8}} 
{"word":"socialism", "definition":"a theory or system of social organization that advocates the vesting of the ownership and control of the means of production and distribution, of capital, land, etc., in the community as a whole."} 
{"index":{"_index":"test_index","_type":"doc","_id":9}} 
{"word":"communism", "definition":"a theory or system of social organization based on the holding of all property in common, actual ownership being ascribed to the community as a whole or to the state."} 
{"index":{"_index":"test_index","_type":"doc","_id":10}} 
{"word":"feudalism", "definition":"the feudal system, or its principles and practices."} 
{"index":{"_index":"test_index","_type":"doc","_id":11}} 
{"word":"monopoly", "definition":"exclusive control of a commodity or service in a particular market, or a control that makes possible the manipulation of prices."} 
{"index":{"_index":"test_index","_type":"doc","_id":12}} 
{"word":"oligopoly", "definition":"the market condition that exists when there are few sellers, as a result of which they can greatly influence price and other market factors."} 

我可以在這兩個領域的應用部分匹配(將與許多領域的工作,只要你想)是這樣的:

POST /test_index/_search 
{ 
    "query": { 
     "match": { 
      "_all": { 
       "query": "theo go", 
       "operator": "and" 
      } 
     } 
    } 
} 

在這種情況下返回:

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 0.7601639, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "5", 
      "_score": 0.7601639, 
      "_source": { 
       "word": "theocracy", 
       "definition": "a form of government in which God or a deity is recognized as the supreme civil ruler, the God's or deity's laws being interpreted by the ecclesiastical authorities." 
      } 
     } 
     ] 
    } 
} 

這是我在這裏使用的代碼(還有更多的博客文章):

http://sense.qbox.io/gist/e4093c25a8257499f54ced5a09f35b1eb48e4e3c

希望有所幫助。

+0

謝謝,我實際上已經檢查過你的博客,我認爲它太棒了!在你看來,對於這種情況,你爲什麼會傾向於n-gram路線,然後使用建議api?當你使用n-gram和模糊性評分變得怪怪的時候,你有沒有看過? – emarel

+0

我喜歡ngrams,因爲你不需要冗餘數據。在一個可能變得重要的大數據集中。評分絕對是一個問題。我的感覺是有一種解決方法,但我不知道如何去做。 –

+0

謝謝,你爲什麼要做:「分析儀」:「ngram_analyzer」「search_analyzer」:「標準」而不是「分析儀」:「ngram_analyzer」? – emarel

相關問題