2015-02-09 27 views
3

請原諒我對ElasticSearch的瞭解。我有一個Elasticsearch集合,其中包含以下文檔:Elasticsearch多個值匹配,沒有分析器

{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 2, 
    "dimensions": { 
     "region": "Coimbra District" 

    } 
} 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Federal District"   
    } 
} 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Masovian Voivodeship" 
    } 
} 

這3個json文檔在ES服務器中編入索引。我沒有提供任何分析器類型(並且不知道如何提供一個:)) 我使用彈簧數據Elasticsearch並執行以下查詢來搜索區域'Masovian Voivodeship'或'Federal District'的文檔:

{ 
    "query_string" : { 
    "query" : "Masovian Voivodeship OR Federal District", 
    "fields" : [ "dimensions.region" ] 
    } 
} 

我期待它返回2次命中。但是,它會返回所有3個文檔(可能是由於第三個文檔中有分區)。我如何修改查詢,以便它可以執行EXACT匹配並僅提供2個文檔?我使用下面的方法:

QueryBuilders.queryString(<OR string>).field("dimensions.region") 

我已經試過QueryBuilders.termsQueryQueryBuilders.inQueryQueryBuilders.matchQuery(帶陣列),但沒有運氣。

任何人都可以請幫忙嗎?提前致謝。

+0

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string- query.html嘗試將default_operator設置爲AND。或者讓你的查詢「Masovian和Voivodeship或聯邦和區」 – 2015-02-09 17:54:14

+0

嗨,我試着用查詢'{ 「query_string」:{ 「query」:「Masovian和Voivodeship OR Federal and District」, 「fields」:[ dimensions.region「] } }'但它沒有返回任何命中。 – 2015-02-09 18:39:49

回答

3

你可以在這裏做幾件事。

首先,我建立了一個沒有任何明確映射或分析的索引,這意味着將使用standard analyzer。這很重要,因爲它決定了我們如何根據文本字段進行查詢。

於是我開始:

DELETE /test_index 

PUT /test_index 
{ 
    "settings": { 
     "number_of_shards": 1, 
     "number_of_replicas": 0 
    } 
} 

PUT /test_index/doc/1 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 2, 
    "dimensions": { 
     "region": "Coimbra District" 

    } 
} 

PUT /test_index/doc/2 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Federal District"   
    } 
} 

PUT /test_index/doc/3 
{ 
    "date": "2013-12-30T00:00:00.000Z", 
    "value": 1, 
    "dimensions": { 
     "region": "Masovian Voivodeship" 
    } 
} 

然後我想你的查詢,並沒有得到命中。我不明白你爲什麼在你fields參數有"dimensions.ga:region",但是當我把它改爲"dimensions.region"我得到了一些結果:

POST /test_index/doc/_search 
{ 
    "query": { 
     "query_string": { 
     "query": "Masovian Voivodeship OR Federal District", 
     "fields": [ 
      "dimensions.region" 
     ] 
     } 
    } 
} 
... 
{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 3, 
     "max_score": 0.46911472, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "3", 
      "_score": 0.46911472, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Masovian Voivodeship" 
       } 
      } 
     }, 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "2", 
      "_score": 0.3533006, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Federal District" 
       } 
      } 
     }, 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "1", 
      "_score": 0.05937162, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 2, 
       "dimensions": { 
        "region": "Coimbra District" 
       } 
      } 
     } 
     ] 
    } 
} 

然而,這將返回你不希望的結果。要解決這個問題的方法之一是如下:

POST /test_index/doc/_search 
{ 
    "query": { 
     "query_string": { 
     "query": "(Masovian AND Voivodeship) OR (Federal AND District)", 
     "fields": [ 
      "dimensions.region" 
     ] 
     } 
    } 
} 
... 
{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.46911472, 
     "hits": [ 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "3", 
      "_score": 0.46911472, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Masovian Voivodeship" 
       } 
      } 
     }, 
     { 
      "_index": "test_index", 
      "_type": "doc", 
      "_id": "2", 
      "_score": 0.3533006, 
      "_source": { 
       "date": "2013-12-30T00:00:00.000Z", 
       "value": 1, 
       "dimensions": { 
        "region": "Federal District" 
       } 
      } 
     } 
     ] 
    } 
} 

另一種方式做到這一點(我喜歡這個更好),這也是同樣的結果是使用match queryboolean should組合:

POST /test_index/doc/_search 
{ 
    "query": { 
     "bool": { 
     "should": [ 
      { 
       "match": { 
        "dimensions.region": { 
        "query": "Masovian Voivodeship", 
        "operator": "and" 
        } 
       } 
      }, 
      { 
       "match": { 
        "dimensions.region": { 
        "query": "Federal District", 
        "operator": "and" 
        } 
       } 
      } 
     ] 
     } 
    } 
} 

這裏是我使用的代碼:

http://sense.qbox.io/gist/bb5062a635c4f9519a411fdd3c8540eae8bdfd51

+1

Hello @Sloan,首先,非常感謝您的詳細解答。我試過你的第三種解決方案(因爲我也認爲這是更好的方法),並像魅力一樣工作!我唯一缺少的是'操作員'。我沒有指定'operator',因此它在生成查詢時採用了默認操作符。默認值是OR,因此它正在搜索帶有OR的bu標記,這就是爲什麼我得到3個結果(甚至在第一次嘗試時通過運行相同的查詢得到3個結果)。我從查詢中刪除了'ga'部分,因爲它是一個錯字。再次,爲解決方案而歡呼:) – 2015-02-09 23:49:02

+0

這個例子在意義上是非常棒的! – gonzalon 2015-02-17 22:55:19