2013-01-06 40 views
14

我需要使用ES中的3個字段進行聚合(分組)。Elasticsearch中的多個分組

我可以在1個查詢中做到這一點,或者我需要爲每列使用facet + iterate嗎?

謝謝

+0

https://github.com/ elasticsearch/elasticsearch/issues/256 – ehsanul

回答

7

您可以通過2種方式做到這一點:使用多個字段在一個小的結果

1):

例如單場刻面:

curl -X GET "http://localhost:9200/sales/order/_search?pretty=true" -d '{ 
    "query": { 
    "query_string": { 
     "query": "shohi*", 
     "fields": [ 
     "billing_name" 
     ] 
    } 
    }, 
    "facets": { 
    "facet_result": { 
     "terms": { 
     "fields": [ 
      "status" 
     ], 
     "order": "term", 
     "size": 15 
     } 
    } 
    } 
}' 

單個方面結果中多個字段的示例:

curl -X GET "http://localhost:9200/sales/order/_search?pretty=true" -d '{ 
    "query": { 
    "query_string": { 
     "query": "shohi*", 
     "fields": [ 
     "billing_name" 
     ] 
    } 
    }, 
    "facets": { 
    "facet_result": { 
     "terms": { 
     "fields": [ 
      "status", 
      "customer_gender", 
      "state" 
     ], 
     "order": "term", 
     "size": 15 
     } 
    } 
    } 
}' 

2)使用多刻面的結果集:

curl -X GET "http://localhost:9200/sales/order/_search?pretty=true" -d '{ 
    "query": { 
    "query_string": { 
     "query": "*", 
     "fields": [ 
     "increment_id" 
     ] 
    } 
    }, 
    "facets": { 
    "status_facets": { 
     "terms": { 
     "fields": [ 
      "status" 
     ], 
     "size": 50, 
     "order": "term" 
     } 
    }, 
    "gender_facets": { 
     "terms": { 
     "fields": [ 
      "customer_gender" 
     ] 
     } 
    }, 
    "state_facets": { 
     "terms": { 
     "fields": [ 
      "state" 
     ], 
     , 
     "order": "term" 
     } 
    } 
    } 
}' 

參考鏈接: http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

+6

這是一個普通的方面查詢,那麼這些組呢?我認爲OP要求將搜索結果分組。 (即使我需要在ES中分組) –

28

ElasticSearch 1.0版本開始,新的aggregations API允許分組由多個字段,使用子聚集。假設你被田野field1field2field3想組:

{ 
    "aggs": { 
    "agg1": { 
     "terms": { 
     "field": "field1" 
     }, 
     "aggs": { 
     "agg2": { 
      "terms": { 
      "field": "field2" 
      }, 
      "aggs": { 
      "agg3": { 
       "terms": { 
       "field": "field3" 
       } 
      } 
      }   
     } 
     } 
    } 
    } 
} 

當然,這可以去儘可能多的領域,只要你願意。

更新:
爲了完整起見,這裏是上述查詢的輸出的外觀。下面還有用於生成聚合查詢並將結果展平成詞典列表的Python代碼。

{ 
    "aggregations": { 
    "agg1": { 
     "buckets": [{ 
     "doc_count": <count>, 
     "key": <value of field1>, 
     "agg2": { 
      "buckets": [{ 
      "doc_count": <count>, 
      "key": <value of field2>, 
      "agg3": { 
       "buckets": [{ 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, 
       { 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, ... 
       ] 
      }, 
      { 
      "doc_count": <count>, 
      "key": <value of field2>, 
      "agg3": { 
       "buckets": [{ 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, 
       { 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, ... 
       ] 
      }, ... 
      ] 
     }, 
     { 
     "doc_count": <count>, 
     "key": <value of field1>, 
     "agg2": { 
      "buckets": [{ 
      "doc_count": <count>, 
      "key": <value of field2>, 
      "agg3": { 
       "buckets": [{ 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, 
       { 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, ... 
       ] 
      }, 
      { 
      "doc_count": <count>, 
      "key": <value of field2>, 
      "agg3": { 
       "buckets": [{ 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, 
       { 
       "doc_count": <count>, 
       "key": <value of field3> 
       }, ... 
       ] 
      }, ... 
      ] 
     }, ... 
     ] 
    } 
    } 
} 

下面的python代碼通過給出字段列表來執行group-by。我指定include_missing=True,它也包括一些字段的缺失值的組合(你不需要它,如果你的版本是2.0 Elasticsearch感謝到this

def group_by(es, fields, include_missing): 
    current_level_terms = {'terms': {'field': fields[0]}} 
    agg_spec = {fields[0]: current_level_terms} 

    if include_missing: 
     current_level_missing = {'missing': {'field': fields[0]}} 
     agg_spec[fields[0] + '_missing'] = current_level_missing 

    for field in fields[1:]: 
     next_level_terms = {'terms': {'field': field}} 
     current_level_terms['aggs'] = { 
      field: next_level_terms, 
     } 

     if include_missing: 
      next_level_missing = {'missing': {'field': field}} 
      current_level_terms['aggs'][field + '_missing'] = next_level_missing 
      current_level_missing['aggs'] = { 
       field: next_level_terms, 
       field + '_missing': next_level_missing, 
      } 
      current_level_missing = next_level_missing 

     current_level_terms = next_level_terms 

    agg_result = es.search(body={'aggs': agg_spec})['aggregations'] 
    return get_docs_from_agg_result(agg_result, fields, include_missing) 


def get_docs_from_agg_result(agg_result, fields, include_missing): 
    current_field = fields[0] 
    buckets = agg_result[current_field]['buckets'] 
    if include_missing: 
     buckets.append(agg_result[(current_field + '_missing')]) 

    if len(fields) == 1: 
     return [ 
      { 
       current_field: bucket.get('key'), 
       'doc_count': bucket['doc_count'], 
      } 
      for bucket in buckets if bucket['doc_count'] > 0 
     ] 

    result = [] 
    for bucket in buckets: 
     records = get_docs_from_agg_result(bucket, fields[1:], include_missing) 
     value = bucket.get('key') 
     for record in records: 
      record[current_field] = value 
     result.extend(records) 

    return result