2015-07-20 64 views
2

我希望能夠查詢文本,但也只檢索與特定整場在我的數據中的最大值的結果。我已閱讀關於聚合和過濾器的文檔,我不太明白我在找什麼。如何使彈性搜索查詢過濾字段的最大值?

舉例來說,我有編入索引是除了整型字段相同的一些重複數據 - 我們稱之爲領域lastseen

所以,作爲一個例子,給出這個數據放入elasticsearch:

// these two the same except "lastseen" field 
    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    "lastseen": 1000 
    }' 

    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    "somevalue": 100 
    }' 

    # and these two the same except "lastseen" field 
    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    "lastseen": 2000 
    }' 

    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    "lastseen": 200 
    }' 

如果我查詢"dinner"

curl -XPOST localhost:9200/myindex -d '{ 
    "query": { 
     "query_string": { 
      "query": "dinner" 
     } 
    } 
    }' 

我會得到4個結果返回。我想有一個過濾器,這樣我只得到兩個結果回來 - 只與最大lastseen領域的項目。

這是顯然不對,但希望它給你的是什麼,我以後的想法:

{ 
    "query": { 
     "query_string": { 
      "query": "dinner" 
     } 
    }, 
    "filter": { 
      "max": "lastseen" 
     } 

} 

結果看起來是這樣的:

"hits": [ 
     { 
     ... 
     "_source": { 
      "field1": "dinner carrot potato broccoli", 
      "field2": "something here", 
      "lastseen": 1000 
     } 
     }, 
     { 
     ... 
     "_source": { 
      "field1": "fish chicken something", 
      "field2": "dinner", 
      "lastseen": 2000 
     } 
     } 
    ] 

更新1:我試圖創建從被索引排除lastseen的映射。這沒有奏效。仍然獲得4個結果。

curl -XPOST localhost:9200/myindex -d '{ 
    "mappings": { 
     "myobject": { 
     "properties": { 
      "lastseen": { 
      "type": "long", 
      "store": "yes", 
      "include_in_all": false 
      } 
     } 
     } 
    } 
}' 

更新2: 我試圖與AGG方案listed here,重複數據刪除,並沒有工作,但更重要的是,我沒有看到一個辦法結合起來,與關鍵字搜索。

+0

如果你有兩個文件與'lastseen:2000',你想同時退回或具有'lastseen:2000'和'lastseen:1000'? –

+0

另外,你認爲什麼是重複的文件?我發現你認識到這種類型的文檔具有相同的'field1'。 –

+0

@AndreiStefan複製文檔將具有相同的field1和field2。 –

回答

4

不理想,但我認爲它可以讓你得到你所需要的。

更改您的field1領域的映射,假設這是用來定義「複製」文件,像這樣的一個:

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "properties": { 
     "field1": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "field2": { 
      "type": "string" 
     }, 
     "lastseen": { 
      "type": "long" 
     } 
     } 
    } 
    } 
} 

意思,你添加一個.raw子是not_analyzed這意味着它將按照它的方式進行索引,不進行分析和分解。這是爲了使有些「重複的文件發現」成爲可能。

然後,你需要使用的field1.raw(重複項)和top_hits子聚集terms聚集,獲得每個field1值的單個文件:

GET /lastseen/test/_search 
{ 
    "size": 0, 
    "query": { 
    "query_string": { 
     "query": "dinner" 
    } 
    }, 
    "aggs": { 
    "field1_unique": { 
     "terms": { 
     "field": "field1.raw", 
     "size": 2 
     }, 
     "aggs": { 
     "first_one": { 
      "top_hits": { 
      "size": 1, 
      "sort": [{"lastseen": {"order":"desc"}}] 
      } 
     } 
     } 
    } 
    } 
} 

同樣,單一文件由top_hits返回是最高的lastseen(由"sort": [{"lastseen": {"order":"desc"}}]提供的東西)。

你會得到的結果是這些(aggregations下不hits):

... 
    "aggregations": { 
     "field1_unique": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "dinner carrot potato broccoli", 
       "doc_count": 2, 
       "first_one": { 
        "hits": { 
        "total": 2, 
        "max_score": null, 
        "hits": [ 
         { 
          "_index": "lastseen", 
          "_type": "test", 
          "_id": "AU60ZObtjKWeJgeyudI-", 
          "_score": null, 
          "_source": { 
           "field1": "dinner carrot potato broccoli", 
           "field2": "something here", 
           "lastseen": 1000 
          }, 
          "sort": [ 
           1000 
          ] 
         } 
        ] 
        } 
       } 
      }, 
      { 
       "key": "fish chicken something", 
       "doc_count": 2, 
       "first_one": { 
        "hits": { 
        "total": 2, 
        "max_score": null, 
        "hits": [ 
         { 
          "_index": "lastseen", 
          "_type": "test", 
          "_id": "AU60ZObtjKWeJgeyudJA", 
          "_score": null, 
          "_source": { 
           "field1": "fish chicken something", 
           "field2": "dinner", 
           "lastseen": 2000 
          }, 
          "sort": [ 
           2000 
          ] 
         } 
        ] 
        } 
       } 
      } 
     ] 
     } 
    } 
+0

謝謝。這正是我所期待的。 –