數據彙總和200億記錄平均值

記錄始於每天使用以下模式創建的AVRO文件。「attribute_key」和「attribute_value」記錄中存儲了20種不同的屬性類型，每個測量中也包含時間戳和device_id。數據彙總和200億記錄平均值

"fields" : [ 
{"type":"string", "name":"device_id"}, 
{"type":"string", "name":"record_date"}, 
{"type":"string", "name":"attribute_key"}, 
{"type":"string", "name":"attribute_value"}]

我已經能夠採取每日文件，並加載到bigquery月分隔表中。

device_attributes201501 
device_attributes201502 
device_attributes201503 
device_attributes201504 
device_attributes201505 
device_attributes201506 
device_attributes201507 
device_attributes201508 
device_attributes201509 
device_attributes201510 
device_attributes201511 
device_attributes201512

我的問題是雙重的，

我需要創建一個包含所有在所有時間收集的獨特device_ids，併爲每個值類型的最新屬性值的表。

device_id, record_date, attribute_key, attribute_value 
    abc123  2015-10-11 attribute_1 5 
    abc123  2015-11-11 attribute_1 5 
    abc123  2015-12-11 attribute_1 10 
    abc123  2015-10-11 attribute_1 0 
    abc456  2015-10-11 attribute_1 0 
    abc789  2015-10-11 attribute_1 0 
    abc123  2015-11-11 attribute_1 0 
    abc456  2015-11-11 attribute_1 0 
    abc789  2015-11-11 attribute_1 6 
    abc123  2015-10-11 attribute_2 blue 
    abc123  2015-11-11 attribute_2 red 
    abc123  2015-12-11 attribute_2 red 
    abc456  2015-12-11 attribute_2 blue 
    abc789  2015-12-11 attribute_2 green

對於某些屬性，每週，每月和每天的平均值也需要計算。（attribute_3是樣本收集的平均值）

device_id, last_update, attribute_1, attribute_2 
    abc123  2015-12-11 6   red 
    abc456  2015-12-11 0   blue 
    abc789  2015-12-11 3   green

我很好奇如何最好地採取利用這個，我不知道在哪裏，從這裏走。這些數據現在處於大查尋中，我可以訪問整套谷歌clould工具......比如數據流或其他任何東西。

數據最初是在S3存儲桶中，所以我可以使用AWS上的任何解決方案處理它。

我只是不知道什麼是最明智的做法。

來源

2017-02-21 chews

BigQuery SQL查詢應該適用於您想要執行的操作。你有這種方法的問題嗎？ –

+在BigQuery中用SQL粉碎它。 –

BigQuery，因爲您不必編寫大量代碼就可以進行基本聚合 – softwarenewbie7331

希望這些鏈接中的一些可以幫助你。創建一個表 https://cloud.google.com/bigquery/docs/tables#creating-a-table

的BigQuery的Web UI https://cloud.google.com/bigquery/bigquery-web-ui

如何從一個查詢（從用戶的博客文章）創建一個表。這表明您可以使用BQ WebUI並指定目標表。我無法在官方文檔中找到它，所以不確定這是否有效。如果沒有，您需要設置API並編寫一些代碼，如上面的示例所示。 https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/

來源

2017-02-22 01:47:48

數據彙總和200億記錄平均值

回答

相關問題