2016-01-28 56 views
0

當前,ES日誌以某些字段具有列表而非單個值的方式進行索引。如何正確地聚合該字段是elasticsearch上的列表

Ex。

_source:{ 
    "field1":"["item1", "item2", "item3"], 
    "field2":"something", 
    "field3": "something_else" 
} 

當然,列表的長度並不總是相同的。我試圖找到一種方法來聚合是由每個項目的日誌數量(所以一些日誌將被計算多次)

我知道我必須使用aggs,但我不知道如何形成正確的查詢(在-d之後)。有人可以幫忙嗎?

+1

你能分享一下你想要的結果嗎? – Richa

+0

@Richa說我有3個記錄:記錄1具有{ 「field1的」: 「[」 ITEM1" , 「ITEM2」, 「項目3」], 「另一惡魔」: 「其他值」,...},RECORD2: { 「字段1」: 「[」 物品1 「],...},RECORD3:{」 字段1 「:」[ 「物品1」, 「ITEM2」],...}現在我想找到一種方法來聚合所以這一結果將會像{ 「物品1」:「記錄1:{數據從-記錄1},RECORD2:十數據來自-RECORD2},RECORD3:十數據來自-RECORD3}, 「ITEM2」:記錄1,RECORD3 ,「item3」:Record1} – JChao

回答

0

您可以使用以下查詢使用terms aggregationtop_hits

{ 
"size": 0, 
"aggs": { 
    "group": { 
    "terms": { 
     "script": "_source.field1.each{}" 
    }, 
    "aggs":{ 
     "top_hits_log" :{ 
     "top_hits" :{ 
     } 
     } 
    } 
    }  
    } 
} 

輸出將是:

"buckets": [ 
     { 
      "key": "item1", 
      "doc_count": 3, 
      "top_hits_log": { 
       "hits": { 
       "total": 3, 
       "max_score": 1, 
       "hits": [ 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "1", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2", 
          "item3" 
          ], 
          "field2": "something1" 
         } 
        }, 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "2", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1" 
          ], 
          "field2": "something2" 
         } 
        }, 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "3", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2" 
          ], 
          "field2": "something3" 
         } 
        } 
       ] 
       } 
      } 
     }, 
     { 
      "key": "item2", 
      "doc_count": 2, 
      "top_hits_log": { 
       "hits": { 
       "total": 2, 
       "max_score": 1, 
       "hits": [ 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "1", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2", 
          "item3" 
          ], 
          "field2": "something1" 
         } 
        }, 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "3", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2" 
          ], 
          "field2": "something3" 
         } 
        } 
       ] 
       } 
      } 
     }, 
     { 
      "key": "item3", 
      "doc_count": 1, 
      "top_hits_log": { 
       "hits": { 
       "total": 1, 
       "max_score": 1, 
       "hits": [ 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "1", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2", 
          "item3" 
          ], 
          "field2": "something1" 
         } 
        } 
       ] 
       } 
      } 
     } 
    ] 

確保啓用dynamic scripting。設置script.disable_dynamic: false

希望這會有所幫助。

0

沒有必要使用scripting。這將是緩慢的,尤其是_source解析。你還需要確保你的field1not_analyzedterms aggregation是在倒排索引唯一令牌進行,你會得到奇怪的結果。

{ 
    "size": 0, 
    "aggs": { 
    "unique_items": { 
     "terms": { 
     "field": "field1", 
     "size": 100 
     }, 
     "aggs": { 
     "documents": { 
      "top_hits": { 
      "size": 10 
      } 
     } 
     } 
    } 
    } 
} 

這裏的大小是100的內部terms aggregation,根據你有多少唯一值覺得你有(默認爲10)改變這一點。

希望這有助於!