2012-10-27 55 views
1

我正在使用ES與河流插件,因爲我使用的是couchDB,我試圖使用nGrams進行查詢。 我已經基本完成了所有我需要的事情,除了有人輸入空格時,查詢無法正常工作。這是因爲ES將查詢中的每個元素都分割爲空間。Elasticsearch - River和nGrams

這是我需要做的:

  • 查詢在一個字符串文本的一部分:

    查詢:「你好義和」迴應:「你好世界,你好字」 /豁「你好,世界,單詞」

  • 按標準排序結果我指定;

  • 不區分大小寫。

這裏是我做了什麼,下面這個問題:How to search for a part of a word with ElasticSearch

curl -X PUT 'localhost:9200/_river/myDB/_meta' -d ' 
{ 
"type" : "couchdb", 
"couchdb" : { 
    "host" : "localhost", 
    "port" : 5984, 
    "db" : "myDB", 
    "filter" : null 
}, 
    "index" : { 
    "index" : "myDB", 
    "type" : "myDB", 
    "bulk_size" : "100", 
    "bulk_timeout" : "10ms", 
    "analysis" : { 
       "index_analyzer" : { 
          "my_index_analyzer" : { 
             "type" : "custom", 
             "tokenizer" : "standard", 
             "filter" : ["lowercase", "mynGram"] 
          } 
       }, 
       "search_analyzer" : { 
          "my_search_analyzer" : { 
             "type" : "custom", 
             "tokenizer" : "standard", 
             "filter" : ["standard", "lowercase", "mynGram"] 
          } 
       }, 
       "filter" : { 
         "mynGram" : { 
            "type" : "nGram", 
            "min_gram" : 2, 
            "max_gram" : 50 
         } 
       } 
    } 
} 
} 
' 

我會再添加一個映射排序:

curl -s -XGET 'localhost:9200/myDB/myDB/_mapping' 
{ 
"sorting": { 
     "Title": { 
      "fields": { 
       "Title": { 
        "type": "string" 
        }, 
       "untouched": { 
        "include_in_all": false, 
        "index": "not_analyzed", 
        "type": "string" 
        } 
       }, 
       "type": "multi_field" 
     }, 
     "Year": { 
       "fields": { 
        "Year": { 
         "type": "string" 
         }, 
         "untouched": { 
          "include_in_all": false, 
          "index": "not_analyzed", 
          "type": "string" 
         } 
        }, 
        "type": "multi_field" 
     } 
    } 
    } 
    }' 

我添加了所有的信息,我只是爲了完成。 無論如何,這種設置,我想應該工作,每當我試圖得到一些結果,空間仍用於分裂我的查詢,例如:

http://localhost:9200/myDB/myDB/_search?q=Title:(Hello%20Wor)&pretty=true 

返回任何包含「你好」和「義和「(我通常不使用括號,但我已經在一個例子中看到了它們,但結果看起來非常相似)。

任何幫助是真正讚賞,因爲這是我的相當多的buging。

UPDATE: 最後,我意識到我並不需要一個nGram。一個正常的索引會做;只需用'AND'替換查詢的空白就可以完成這項工作。

例子:

Query: "Hello World" ---> Replaced as "(*Hello And World*)" 
+0

沒有嘗試'Q =標題:(+你好+ WOR)' –

+0

我發現,Q =標題:(*您好,義和*)的作品 – N3sh

+0

與n元語法的問題,只是你在空格上標記。我想你可以使用Keyword Tokenizer而不是標準版。 – javanna

回答

1

沒有彈性的搜索設置了,但也許這從文檔幫助?

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

Types of Match Queries 

boolean 

The default match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The operator flag can be set to or or and to control the boolean clauses (defaults to or). 

The analyzer can be set to control which analyzer will perform the analysis process on the text. It default to the field explicit mapping definition, or the default search analyzer. 

fuzziness can be set to a value (depending on the relevant type, for string types it should be a value between 0.0 and 1.0) to constructs fuzzy queries for each term analyzed. The prefix_length and max_expansions can be set in this case to control the fuzzy process. If the fuzzy option is set the query will use constant_score_rewrite as its rewrite method the rewrite parameter allows to control how the query will get rewritten. 

Here is an example when providing additional parameters (note the slight change in structure, message is the field name): 

{ 
    "match" : { 
     "message" : { 
      "query" : "this is a test", 
      "operator" : "and" 
     } 
    } 
} 
相關問題