2011-01-12 95 views
1

我期待從我們的數據庫中獲取累積頻率數據。我創建了一個簡單的臨時表,其中包含我們所見過的所有唯一狀態更新計數以及具有這些狀態更新量的用戶數。SQL:快速累積頻率查詢(postgres)

 Table "pg_temp_4.statuses_count_tmp" 
    Column  | Type | Modifiers 
----------------+---------+----------- 
statuses_count | integer | 
frequency  | bigint | 
Indexes: 
    "statuses_count_idx" UNIQUE, btree (statuses_count) 

我當前的查詢是:

select statuses_count, frequency/(select * from total_statuses)::float, (select sum(frequency)/(select * from total_statuses)::float AS percentage from statuses_count_tmp WHERE statuses_count <= SCT.statuses_count) AS cumulative_percent FROM statuses_count_tmp AS SCT ORDER BY statuses_count DESC; 

但這需要相當長一段時間,查詢的數量相當迅速增長。因此,對於我擁有的50,000行,我正在查看50k階乘行數來讀取。坐在這裏看着這個疑問,我希望這是一個更好的解決方案,但我還沒有完成。

希望能得到這樣的:

0  0.26975161  0.26975161 
1  0.15306534  0.42281695 
2  0.05513516  0.47795211 
3  0.03050646  0.50845857 
4  0.02064444  0.52910301 

回答

2

應該是可以解決的開窗功能,假設你的PostgreSQL 8.4或更高版本。我猜total_statuses是沿select sum(frequency) from statuses_count_tmp行的視圖或臨時表?我寫它作爲一個CTE這裏應該使計算結果只是一次對語句的持續時間:

with total_statuses as (select sum(frequency) from statuses_count_tmp) 
select statuses_count, 
     frequency/(select * from total_statuses) as frequency, 
     sum(frequency) over(order by statuses_count) 
     /(select * from total_statuses) as cumulative_frequency 
from statuses_count_tmp 

沒有8.4的窗口功能,最好的辦法是簡單地處理重複數據:

create type cumulative_sum_type as (statuses_count int, frequency numeric, cumulative_frequency numeric); 
create or replace function cumulative_sum() returns setof cumulative_sum_type strict stable language plpgsql as $$ 
declare 
    running_total bigint := 0; 
    total bigint; 
    data_in record; 
    data_out cumulative_sum_type; 
begin 
    select sum(frequency) into total from statuses_count_tmp; 
    for data_in in select statuses_count, frequency from statuses_count_tmp order by statuses_count 
    loop 
    data_out.statuses_count := data_in.statuses_count; 
    running_total := running_total + data_in.frequency; 
    data_out.frequency = data_in.frequency::numeric/total; 
    data_out.cumulative_frequency = running_total::numeric/total; 
    return next data_out; 
    end loop; 
end; 
$$; 
select * from cumulative_sum(); 
+0

啊,沒有這樣的運氣。 8.3.9並沒有真正的希望在接下來的幾天內更新它,但是一旦我們得到更新,我會牢記這個解決方案。 – Peck 2011-01-12 20:27:47