2014-10-05 78 views
2

面對的查詢設計問題,不知道我的解決問題的方法是否是不必要的複雜內窗口功能反對目前的分析查詢的其中一個(例如)將是:PostgreSQL的窗口功能

with intervals as (
    select 
    (select '09/27/2014'::date) + (n  || ' minutes')::interval start_time, 
    (select '09/27/2014'::date) + ((n+60) || ' minutes')::interval end_time 
     from generate_series(0, (24*60*7), 60 * 4) n 
) 
    select 
    extract(epoch from i.start_time)::numeric * 1000 as ts, 
    extract(epoch from i.end_time)::numeric * 1000 as end_ts, 
    sum(avg(messages.score)) over (order by i.start_time) as score 

    from messages 
    right join intervals i 
    on messages.timestamp >= i.start_time and messages.timestamp < i.end_time 

    where messages.timestamp between '09/27/2014' and '10/04/2014' 

    group by i.start_time, i.end_time 
    order by i.start_time 

正如你們可能會說 - 這個查詢計算「得分」 attribut的平均e用於給定時間桶分佈的消息,然後與其一起計算桶(使用窗口)的累積。

接下來我要做的是找到最接近每個存儲桶平均值的前5(例如)messages.text

現在,我唯一的計劃是:

1) Join messages with the time-buckets 
2) Compute a score - avg(score) over (partition by start_time) as deviation and save it against each record of the joined relation 
3) Compute a rank() over (order by deviation) as rank 
4) Select where rank between 1 and 5 

我之所以把這個下來勢在必行的步驟,因爲我第一次嘗試在未來與參與設計使用中的窗口函數窗口函數(rank() over (partition by start_time, order by score - avg(score) over (partition by start_time)),我甚至沒有試圖去查看它是否可行。

請問我能否就正確的方向邁向一些建議?

+0

注意:'generate_series()'也適用於時間戳。 'generate_series('2014-09-27','2014-10-04','1 hour':: interval)'可能會做你想要的。 – wildplasser 2014-10-05 10:40:57

+0

糾錯:那應該是'generate_series('2014-09-27 00:00:00','2014-10-04 00:00:00','1小時':: interval)' – wildplasser 2014-10-05 11:29:38

+0

@wildplasser啊,是的,你是對的 - 這是一個很好的重構建議,我會解決這個問題!^_ ^ – Slania 2014-10-05 14:25:30

回答

0

幼龍 - 這裏是我已經和似乎工作:

現已開始接受批評的,性能優化的結構和我的查詢冗餘!^_ ^(減去直接生成時間序列,而不是所有最終修復的扭曲間隔數學)

with intervals as (
    select 
     (select '09/29/2014'::date) + (n  || ' minutes')::interval start_time, 
     (select '09/29/2014'::date) + ((n+60) || ' minutes')::interval end_time 
     from generate_series(0, (24*60*7), 60 * 4) n 
), intervaled_messages as (
    select 
     extract(epoch from i.start_time)::numeric * 1000 as ts, 
     extract(epoch from i.end_time)::numeric * 1000 as end_ts, 
     abs(score - avg(score) over (partition by i.start_time)) as deviation 
    from messages 
    right join intervals i 
     on messages.timestamp >= i.start_time and messages.timestamp < i.end_time 
    where messages.timestamp between '09/29/2014' and '10/06/2014' 
), ranked_messages as (
    select ts, end_ts, deviation, 
    rank() over (partition by ts order by deviation) as rank, 
    row_number() over (partition by ts order by deviation) as row_number 
    from intervaled_messages 
) 
select ts, end_ts, deviation, rank 
from ranked_messages 
where rank between 1 and 5 
    and row_number between 1 and 5 
order by ts; 
0

你應該標題(這只是我的建議)方向:

  1. 獲得的平均分(所有記錄)
  2. 操作MINUS(row score, avg(score))

-- This will leave you with values also positive and negative

  1. 對來自步驟2的每個操作使用abs(),在相同的計算
  2. 使用rank()和他們爲了approprietly
  3. WHERE rank BETWEEN 1 AND 5