2017-03-22 66 views
0

從商店購買商品時,我有一個BigQuery表記錄。它包含一個ItemID和一個時間戳。我對購買的每件商品的運行總數感興趣。我有這個查詢生成運行總計:BigQuery:如何隨時間採樣運行總計

SELECT ItemID,timestamp,count(*) 
OVER 
    (PARTITION BY ItemID 
    ORDER BY timestamp ASC, ItemID) AS runningtotal 
from 
(
    SELECT * FROM [mydb.purchases] 
) 
ORDER BY timestamp 

此表有成百上千的行。 我現在想要做的是花費一段時間(例如一週),並在該周內爲每個ItemID獲取100個運行總計樣本(以繪製沒有太多數據點的圖)。 我不知道如何做到這一點。我可以通過過濾諸如「where(rownumber%(rowcount/100)= 0」)來獲得100個樣本,但我怎樣才能爲表中的每個ItemID執行此操作?是否需要爲每個ItemID執行多個子查詢,然後創建工會感謝

+0

重要的SO - 你可以'標誌使用左側的刻度接受answer'發佈的答案,低於投票。看到http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235爲什麼它很重要!對答案投票也很重要。表決有用的答案。 ...當某人回答你的問題時,你可以檢查該怎麼做 - http://stackoverflow.com/help/someone-answers。 –

回答

0

使用標準的SQL,你可以使用裏面ARRAY_AGG功能LIMIT條款首先收集100個時間戳的樣本:

#standardSQL 
SELECT ItemID, timestamp, COUNT(*) 
OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS running_total 
FROM (
SELECT ItemID, ARRAY_AGG(timestamp LIMIT 100) timestamps 
FROM `mydb.purchases`) t, t.timestamps timestamp 
ORDER BY timestamp 

如果不這樣做,你可以使用RAND()洗牌時間戳隨機抽樣:

#standardSQL 
SELECT ItemID, timestamp, COUNT(*) 
OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS running_total 
FROM (
SELECT ItemID, ARRAY_AGG(timestamp ORDER BY RAND() LIMIT 100) timestamps 
FROM `mydb.purchases`) t, t.timestamps timestamp 
ORDER BY timestamp 
0

下面究竟是幹什麼的,你在採樣
的感覺描述我離開selecting week worse of data方面了,因爲它是瑣碎

#standardSQL 
SELECT 
    ItemID, 
    timestamp, 
    runningtotal 
FROM (
    SELECT 
    ItemID, 
    timestamp, 
    COUNT(1) OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS runningtotal, 
    ROW_NUMBER() OVER (PARTITION BY ItemID ORDER BY timestamp ASC) AS rownumber, 
    COUNT(1) OVER(PARTITION BY ItemID) AS rowcount 
    FROM `mydb.purchases` 
) 
WHERE MOD(rownumber, CAST(rowcount/100 AS INT64)) = 0 
-- ORDER BY ItemID, timestamp