2011-06-01 53 views
7

我有一個events表,其中有兩列eventkey(唯一主鍵)和createtime,它將事件的創建時間存儲爲NUMBER列中1970年1月1日以來的毫秒數。在Oracle中創建直方圖/頻率分佈的最佳方法?

我想創建一個「直方圖」或頻率分佈,顯示在過去一週的每個小時內創建了多少個事件。

這是使用width_bucket()函數在Oracle中編寫此類查詢的最佳方法嗎?是否有可能使用其他Oracle分析函數之一推導出落入每個存儲桶的行數,而不是使用width_bucket來確定每行所屬的存儲分區數量,並在此基礎上執行count(*)

-- 1305504000000 = 5/16/2011 12:00am GMT 
-- 1306108800000 = 5/23/2011 12:00am GMT 
select 
timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start, 
numevents 
from (
    select bucket, count(*) as events from (
    select eventkey, createtime, 
    width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket 
    from events 
    where createtime between 1305504000000 and 1306108800000 
) group by bucket 
) 
order by period_start 

回答

10

如果您createtime是一個日期列,這將是微不足道的:

SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*) 
    FROM EVENTS 
GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24'); 

因爲它是,鑄造createtime列不是太難:

select TO_CHAR( 
     TO_DATE('19700101', 'YYYYMMDD') + createtime/86400000), 
     'DAY:HH24') AS BUCKET, COUNT(*) 
    FROM EVENTS 
    WHERE createtime between 1305504000000 and 1306108800000 
group by TO_CHAR( 
     TO_DATE('19700101', 'YYYYMMDD') + createtime/86400000), 
     'DAY:HH24') 
order by 1 

如果您正在尋找fencepost值(例如,我從哪裏開始第一個十分位數(0-10% )到下(11-20%),你會做這樣的事情:

select min(createtime) over (partition by decile) as decile_start, 
     max(createtime) over (partition by decile) as decile_end, 
     decile 
    from (select createtime, 
       ntile (10) over (order by createtime asc) as decile 
      from events 
     where createtime between 1305504000000 and 1306108800000 
     ) 
+0

這個效果很好,謝謝。不知道爲什麼我沒有想到簡單地截斷日期,我想我很想搞清楚如何解析和投射這種奇怪的「日期」格式 – 2011-06-01 14:26:37

+0

有沒有辦法維護create_times的行零計數? – 2014-10-29 15:59:03

3

我不熟悉Oracle的日期函數,但我敢肯定有寫這篇聲明的Postgres的等效方式:

select date_trunc('hour', stamp), count(*) 
from your_data 
group by date_trunc('hour', stamp) 
order by date_trunc('hour', stamp) 
+1

在PG中完美工作!真的太快了。 – 2016-01-17 16:56:31

1

差不多亞當相同的反應,但我寧願如果需要保持period_start爲時間字段,以便更容易進一步篩選:

with 
events as 
(
    select rownum eventkey, round(dbms_random.value(1305504000000, 1306108800000)) createtime 
    from dual 
    connect by level <= 1000 
) 
select 
    trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') period_start, 
    count(*) numevents 
from 
    events 
where 
    createtime between 1305504000000 and 1306108800000 
group by 
    trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') 
order by 
    period_start 
+0

你能解釋一下事件作爲()的目的嗎?爲什麼你選擇隨機值?我不熟悉Oracle語法 – 2011-06-01 14:06:14

+0

抱歉...由於我沒有數據表來運行查詢,因此我正在生成隨機數據來模擬您的表中可能存在的內容。 「with events」語句只允許我將該查詢別名爲「事件」,這樣查詢的其餘部分將與您可以直接使用的事件表相匹配,而無需進行任何更改。爲了您的目的,只需刪除上面的所有內容「select trunc(....」 – Craig 2011-06-01 14:23:47

+0

啊謝謝,我明白這對於這種類型的回答會有用:) – 2011-06-01 14:26:04