2017-04-20 168 views
2

我在Elasticsearch中嵌套聚合有一些問題。我有一個嵌套的字段映射:Elasticsearch。嵌套字段上的術語聚合具有重複值

POST my_index/ my_type/_mapping 
{ 
    "properties": { 
     "name": { 
      "type": "keyword" 
     }, 
     "nested_fields": { 
      "type": "nested", 
       "properties": { 
       "key": { 
        "type": "keyword" 
       }, 
       "value": { 
        "type": "keyword" 
       } 
      } 
     } 
    } 
} 

然後,添加一個文件索引:

POST my_index/ my_type 
{ 
    "name":"object1", 
     "nested_fields":[ 
      { 
       "key": "key1", 
       "value": "value1" 

      }, 
      { 
       "key": "key1", 
       "value": "value2" 
      } 
     ] 
} 

正如你看到的,在我的嵌套數組我有兩個項目,其中有類似key領域,但不同value字段。然後我想做出這樣的查詢:

GET/my_index/my_type/_search 
{ 
    "query": { 
     "nested": { 
      "path": "nested_fields", 
       "query": { 
       "bool": { 
        "must": [ 
         { 
          "term": { 
           "nested_fields.key": { 
            "value": "key1" 
           } 
          } 
         }, 
         { 
          "terms": { 
           "nested_fields.value": [ 
            "value1", 
            "value2" 
           ] 
          } 
         } 
        ] 
       } 
      } 
     } 
    }, 
    "aggs": { 
     "agg_nested_fields": { 
      "nested": { 
       "path": "nested_fields" 
      }, 
      "aggs": { 
       "agg_nested_fields_key": { 
        "terms": { 
         "field": "nested_fields.key", 
          "size": 10 
        } 
       } 
      } 
     } 
    } 
} 

正如你看到的,我想找到的所有文件,其中至少有一個物體在nested_field數組,key屬性等於key1並提供一個值(value1value2)。然後我想通過nested_fields.key將創建的文檔分組。但是,我有這樣的反應

{ 
    "took": 13, 
     "timed_out": false, 
      "_shards": { 
     "total": 5, 
      "successful": 5, 
       "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
      "max_score": 0.87546873, 
       "hits": [ 
        { 
         "_index": "my_index", 
         "_type": "my_type", 
         "_id": "AVuLNXxiryKmA7VEwOfV", 
         "_score": 0.87546873, 
         "_source": { 
          "name": "object1", 
          "nested_fields": [ 
           { 
            "key": "key1", 
            "value": "value1" 
           }, 
           { 
            "key": "key1", 
            "value": "value2" 
           } 
          ] 
         } 
        } 
       ] 
    }, 
    "aggregations": { 
     "agg_nested_fields": { 
      "doc_count": 2, 
       "agg_nested_fields_key": { 
       "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
         "buckets": [ 
          { 
           "key": "key1", 
           "doc_count": 2 
          } 
         ] 
      } 
     } 
    } 
} 

正如你從反應看,我有一重擊(這是正確的),但該文件在聚集(見doc_count: 2)計算兩次,因爲它有兩個項目與「 key1'的值在nested_fields數組中。我如何在聚合中獲得正確的計數?

+0

這是正確的計數,因爲每個嵌套元素本身就是一個文檔。所以你真的有兩個嵌套的文件,它們的key1和'value1'或'value2'都是它們的值。 – Val

+0

是的,我需要這個。我如何解決這個問題? – Stalso

+0

是否有幫助https://stackoverflow.com/a/27578607/7379424? –

回答

0

您將不得不在嵌套聚合中使用reverse_nested aggs以返回根文檔上的聚合計數。

{ 
    "query": { 
     "nested": { 
      "path": "nested_fields", 
      "query": { 
       "bool": { 
        "must": [{ 
          "term": { 
           "nested_fields.key": { 
            "value": "key1" 
           } 
          } 
         }, 
         { 
          "terms": { 
           "nested_fields.value": [ 
            "value1", 
            "value2" 
           ] 
          } 
         } 
        ] 
       } 
      } 
     } 
    }, 
    "aggs": { 
     "agg_nested_fields": { 
      "nested": { 
       "path": "nested_fields" 
      }, 
      "aggs": { 
       "agg_nested_fields_key": { 
        "terms": { 
         "field": "nested_fields.key", 
         "size": 10 
        }, 
        "aggs": { 
         "back_to_root": { 
          "reverse_nested": { 
           "path": "_source" 
          } 
         } 
        } 
       } 
      } 
     } 
    } 
} 
+0

這也是不正確的。 – Stalso

+0

如何,你想要父/根doc的數量。好吧,我直到這裏才明白'正如你從響應中看到的那樣,我有一個命中(這是正確的),但是文檔在聚合中被計數了兩次(參見doc_count:2),因爲它有兩個'key1'值nested_fields數組。我如何在聚合中獲得正確的數量?'你有更多的信息添加你想要達到的目標 – user3775217