如何正確地聚合該字段是elasticsearch上的列表

當前，ES日誌以某些字段具有列表而非單個值的方式進行索引。如何正確地聚合該字段是elasticsearch上的列表

Ex。

_source:{ 
    "field1":"["item1", "item2", "item3"], 
    "field2":"something", 
    "field3": "something_else" 
}

當然，列表的長度並不總是相同的。我試圖找到一種方法來聚合是由每個項目的日誌數量（所以一些日誌將被計算多次）

我知道我必須使用aggs，但我不知道如何形成正確的查詢（在-d之後）。有人可以幫忙嗎？

來源

2016-01-28 JChao

你能分享一下你想要的結果嗎？ – Richa

@Richa說我有3個記錄：記錄1具有{ 「field1的」：「[」 ITEM1" ，「ITEM2」，「項目3」]，「另一惡魔」：「其他值」，...}，RECORD2： { 「字段1」：「[」物品1 「]，...}，RECORD3：{」字段1 「：」[ 「物品1」，「ITEM2」]，...}現在我想找到一種方法來聚合所以這一結果將會像{ 「物品1」：「記錄1：{數據從-記錄1}，RECORD2：十數據來自-RECORD2}，RECORD3：十數據來自-RECORD3}，「ITEM2」：記錄1，RECORD3 ，「item3」：Record1} – JChao

您可以使用以下查詢使用terms aggregation和top_hits。

{ 
"size": 0, 
"aggs": { 
    "group": { 
    "terms": { 
     "script": "_source.field1.each{}" 
    }, 
    "aggs":{ 
     "top_hits_log" :{ 
     "top_hits" :{ 
     } 
     } 
    } 
    }  
    } 
}

輸出將是：

"buckets": [ 
     { 
      "key": "item1", 
      "doc_count": 3, 
      "top_hits_log": { 
       "hits": { 
       "total": 3, 
       "max_score": 1, 
       "hits": [ 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "1", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2", 
          "item3" 
          ], 
          "field2": "something1" 
         } 
        }, 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "2", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1" 
          ], 
          "field2": "something2" 
         } 
        }, 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "3", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2" 
          ], 
          "field2": "something3" 
         } 
        } 
       ] 
       } 
      } 
     }, 
     { 
      "key": "item2", 
      "doc_count": 2, 
      "top_hits_log": { 
       "hits": { 
       "total": 2, 
       "max_score": 1, 
       "hits": [ 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "1", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2", 
          "item3" 
          ], 
          "field2": "something1" 
         } 
        }, 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "3", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2" 
          ], 
          "field2": "something3" 
         } 
        } 
       ] 
       } 
      } 
     }, 
     { 
      "key": "item3", 
      "doc_count": 1, 
      "top_hits_log": { 
       "hits": { 
       "total": 1, 
       "max_score": 1, 
       "hits": [ 
        { 
         "_index": "so", 
         "_type": "test", 
         "_id": "1", 
         "_score": 1, 
         "_source": { 
          "field1": [ 
          "item1", 
          "item2", 
          "item3" 
          ], 
          "field2": "something1" 
         } 
        } 
       ] 
       } 
      } 
     } 
    ]

確保啓用dynamic scripting。設置script.disable_dynamic: false

希望這會有所幫助。

來源

2016-01-29 18:42:37 Richa

沒有必要使用scripting。這將是緩慢的，尤其是_source解析。你還需要確保你的field1是not_analyzed或terms aggregation是在倒排索引唯一令牌進行，你會得到奇怪的結果。

{ 
    "size": 0, 
    "aggs": { 
    "unique_items": { 
     "terms": { 
     "field": "field1", 
     "size": 100 
     }, 
     "aggs": { 
     "documents": { 
      "top_hits": { 
      "size": 10 
      } 
     } 
     } 
    } 
    } 
}

這裏的大小是100的內部terms aggregation，根據你有多少唯一值覺得你有（默認爲10）改變這一點。

希望這有助於！

來源

2016-01-29 19:08:00 ChintanShah25

如何正確地聚合該字段是elasticsearch上的列表

回答

相關問題