2017-08-11 37 views
0

資金這是我Mapping彈性搜索中獲得頂尖分組與額外的過濾器(Elasticsearch version5.3)

{ 
    "settings" : { 
     "number_of_shards" : 2, 
     "number_of_replicas" : 1 
    }, 
    "mappings" :{ 
     "cpt_logs_mapping" : { 
      "properties" : { 
       "channel_id" : {"type":"integer","store":"yes","index":"not_analyzed"}, 
       "playing_date" : {"type":"string","store":"yes","index":"not_analyzed"}, 
       "country_code" : {"type":"text","store":"yes","index":"analyzed"}, 
       "playtime_in_sec" : {"type":"integer","store":"yes","index":"not_analyzed"}, 
       "channel_name" : {"type":"text","store":"yes","index":"analyzed"}, 
       "device_report_tag" : {"type":"text","store":"yes","index":"analyzed"} 
      } 
     } 
    } 
} 

我要查詢的方式類似索引我用下面MySQL查詢:

SELECT 
    channel_name, 
    SUM(`playtime_in_sec`) as playtime_in_sec 
FROM 
    channel_play_times_bar_chart 
WHERE 
country_code = 'country' AND 
device_report_tag = 'device' AND 
channel_name = 'channel' 
playing_date BETWEEN 'date_range_start' AND 'date_range_end' 
GROUP BY channel_id 
ORDER BY SUM(`playtime_in_sec`) DESC 
LIMIT 30; 

到目前爲止,我QueryDSL看起來像這樣

{ 
    "size": 0, 
    "aggs": { 
    "ch_agg": { 
     "terms": { 
     "field": "channel_id", 
     "size": 30 , 
     "order": { 
       "sum_agg": "desc" 
     } 
     }, 
     "aggs": { 
     "sum_agg": { 
      "sum": { 
      "field": "playtime_in_sec" 
      } 
     } 
     } 
    } 
    } 
} 

問題1 雖然QueryDSL我已經做回我的前30 channel_ids w.r.t遊戲時間,但我很困惑如何過搜索即COUNTRY_CODE,device_report_tag & playing_date中添加其他過濾器。

問題2 另一個問題是,結果集只包含不像MySQL結果集返回我CHANNEL_NAME和playtime_in_sec列channel_id和玩耍領域。這意味着我想使用channel_id字段實現聚合,但結果集應該返回組的相應channel_name名稱。

NOTE:這裏的性能是最重要的,因爲它應該在圖形生成器後面運行,查詢數百萬甚至更多的文檔。

測試數據

hits: [ 
    { 
     _index: "cpt_logs_index", 
     _type: "cpt_logs_mapping", 
     _id: "", 
     _score: 1, 
     _source: { 
      ChID: 1453, 
      playtime_in_sec: 35, 
      device_report_tag: "mydev", 
      channel_report_tag: "Sony Six", 
      country_code: "SE", 
      @timestamp: "2017-08-11", 
     } 
    }, 
    { 
     _index: "cpt_logs_index", 
     _type: "cpt_logs_mapping", 
     _id: "", 
     _score: 1, 
     _source: { 
      ChID: 145, 
      playtime_in_sec: 25, 
      device_report_tag: "mydev", 
      channel_report_tag: "Star Movies", 
      country_code: "US", 
      @timestamp: "2017-08-11", 
     } 
    }, 
    { 
     _index: "cpt_logs_index", 
     _type: "cpt_logs_mapping", 
     _id: "", 
     _score: 1, 
     _source: { 
      ChID: 12, 
      playtime_in_sec: 15, 
      device_report_tag: "mydev", 
      channel_report_tag: "HBO", 
      country_code: "PK", 
      @timestamp: "2017-08-12", 
     } 
    } 
] 

回答

0

問題1:

您是否在尋找一個過濾器/查詢添加到上面的例子?如果是這樣,你可以簡單地添加一個「查詢」節點來查詢文件:

{ 
    "size": 0, 
    "query":{ 
    "bool":{ 
     "must":[ 
      {"terms": { "country_code": ["pk","us","se"] } }, 
      {"range": { "@timestamp": { "gt": "2017-01-01", "lte": "2017-08-11" } } } 
      ] 
    } 
    }, 
    "aggs": { 
    "ch_agg": { 
     "terms": { 
     "field": "ChID", 
     "size": 30 
     }, 
     "aggs":{ 
     "ch_report_tag_agg": { 
      "terms":{ 
       "field" :"channel_report_tag.keyword" 
      }, 
      "aggs":{ 
       "sum_agg":{ 
        "sum":{ 
        "field":"playtime_in_sec" 
        } 
       } 
      } 
     } 
     } 
    } 
    } 
} 

您可以使用所有正常的查詢/過濾器的彈性預先過濾搜索你開始聚集(關於演出前,elasticsearch將適用任何過濾器/查詢之前開始聚集,所以你可以在這裏做任何過濾將有很大的幫助)

問題2:

在我的頭頂,我建議兩種解決方案之一(除非我不完全誤解了這個問題):

  1. 按照您要向下鑽取的順序在輸出中添加所需字段的aggs級別。(你可以非常深入地在aggs中嵌套aggs,並且可以獲得每個級別的計數獎勵)

  2. 在aggs的「最低」級別上使用top_hits聚合,並使用「 _source「:{ 」包括「:[/ 領域 /]}

你能提供測試數據的記錄數?

此外,瞭解您正在運行的ElasticSearch版本非常有用,因爲主要版本之間的語法和行爲有很大差異。

+0

感謝您的回答。我添加了一些測試數據。我正在使用Elasticsearch版本5.3 –

+0

@DanishBinSofwan使用附加級別的aggs更新了示例以保存通道ID。 請注意,由於您的國家代碼被「分析」的映射,您將需要以小寫字母進行過濾。如果你想把它作爲一個術語,你必須修改映射或者添加一個not_analyzed版本。 – Peter