面對的查詢設計問題,不知道我的解決問題的方法是否是不必要的複雜內窗口功能反對目前的分析查詢的其中一個(例如)將是:PostgreSQL的窗口功能
with intervals as (
select
(select '09/27/2014'::date) + (n || ' minutes')::interval start_time,
(select '09/27/2014'::date) + ((n+60) || ' minutes')::interval end_time
from generate_series(0, (24*60*7), 60 * 4) n
)
select
extract(epoch from i.start_time)::numeric * 1000 as ts,
extract(epoch from i.end_time)::numeric * 1000 as end_ts,
sum(avg(messages.score)) over (order by i.start_time) as score
from messages
right join intervals i
on messages.timestamp >= i.start_time and messages.timestamp < i.end_time
where messages.timestamp between '09/27/2014' and '10/04/2014'
group by i.start_time, i.end_time
order by i.start_time
正如你們可能會說 - 這個查詢計算「得分」 attribut的平均e用於給定時間桶分佈的消息,然後與其一起計算桶(使用窗口)的累積。
接下來我要做的是找到最接近每個存儲桶平均值的前5(例如)messages.text
。
現在,我唯一的計劃是:
1) Join messages with the time-buckets
2) Compute a score - avg(score) over (partition by start_time) as deviation and save it against each record of the joined relation
3) Compute a rank() over (order by deviation) as rank
4) Select where rank between 1 and 5
我之所以把這個下來勢在必行的步驟,因爲我第一次嘗試在未來與參與設計使用中的窗口函數窗口函數(rank() over (partition by start_time, order by score - avg(score) over (partition by start_time))
,我甚至沒有試圖去查看它是否可行。
請問我能否就正確的方向邁向一些建議?
注意:'generate_series()'也適用於時間戳。 'generate_series('2014-09-27','2014-10-04','1 hour':: interval)'可能會做你想要的。 – wildplasser 2014-10-05 10:40:57
糾錯:那應該是'generate_series('2014-09-27 00:00:00','2014-10-04 00:00:00','1小時':: interval)' – wildplasser 2014-10-05 11:29:38
@wildplasser啊,是的,你是對的 - 這是一個很好的重構建議,我會解決這個問題!^_ ^ – Slania 2014-10-05 14:25:30