2013-01-07 155 views
2

如何查詢數據中的時間片,當時間片的時間片大於所需的時間片。最終結果將用於繪製堆積的條形圖。如何查詢DB時間片大於所需時間片的時間序列數據?

實施例的數據:使用的時間片100 「單元」

START_TS (int)| END_TS (int) | DATA (int) | GROUP 
----------------------------------- 
0  | 179  | 2000 | G1 
180  | 499  | 1000 | G2 
500  | 699  | 1000 | G1 
845 ... 

求購輸出。輸出中不需要End_ts,但有助於理解計算。

START_TS | END_TS | DATA (equation = amount in that time slice) | GROUP 
------------------------------------------------------- 
0  | 99 | (2000/180) * 100 = 1111 | G1 
100  | 199 | (2000/180) * 80 = 889 | G1 
100  | 199 | (1000/320) * 20 = 63 | G2 
200  | 299 | (1000/320) * 100 = 313 | G2 
300  | 399 | (1000/320) * 100 = 313 | G2 
400  | 499 | (1000/320) * 100 = 313 | G2 

從這裏得到時間序列是這樣的。

SELECT (startts/100)*100, ... 
FROM TABLE 
    FULL JOIN 
     (SELECT startts from generate_series(0,700,100) startts) s1 
    USING (startts) 
GROUP BY startts/100 

因此,這將是這樣的(沒有GROUP BY)

STARTTS | ENDTS | DATA | GROUP 
    0  | 179  | 2000 | G1 
    100 |  
    180 | 499  | 1000 | G2 
    200 | 
    300 | 
    400 | 
    500 | 699  | 1000 | G1 
    600 | 
    700 

但我怎麼分割中的數據具有兩個或多個生成的行(時間片行),在計算時間切片。


**這基本上起作用,但對大數據集沒有真正的功能。行像1-100M行。

下面是該查詢做一些+更多的不重疊的時間片

SELECT (start_ts/100)*100 as start_ts, sum(part) as data, cgroup 
FROM (
SELECT *, (data * (overlap_end-overlap_start + 1)/(end_ts - tts + 1)) as part 
FROM 
    (
    SELECT (case when s1.start_ts > t.start_ts then s1.start_ts else t.start_ts end) as overlap_start, 
     (case when s1.start_ts+100 < t.end_ts then s1.start_ts+100-1 else t.end_ts end) as overlap_end, 
     t.start_ts as tts, s1.start_ts as start_ts, t.end_ts, cgroup, data 
    FROM (SELECT start_ts from generate_series(0,800,100) start_ts) s1 
     LEFT OUTER JOIN test t on t.start_ts < s1.start_ts+100 and t.end_ts >= s1.start_ts 
    ) t 
) t2 
GROUP BY start_ts/100, cgroup 
+0

你有一個 '重複' 所需的行('START_TS = 100,END_TS = 199') - 你想這與其他部分進行彙總?另外,你知道你所做的任何分割都將完全捏造/平均,對嗎?因爲在他們最初發生的時間片中你不知道_when_;這就像一個遊客想知道爲什麼指導手冊說'帶上外套',當年的平均溫度是90°F - 這只是一年中的40°F的一天。通常最好從原始數據構建這種東西 - 它是否可用? –

+0

是的,我想在「100」片段中有兩個start_ts值,因爲它們將顯示該片段中的每個組值。我知道它會製作/平均結果,但這是現在想要的功能。我正在繪製堆疊條或實際堆疊的線條圖,其中每條線都是1像素寬,並與該切片中的所有組進行堆疊。原始數據可能會在周圍,但只有在達到某個縮放級別後才能使用,並且不在此問題中。 –

回答

1

你需要的是分割不同的時隙到箱中,由序列定義的聚合值。以下查詢執行此通過修改連接條件,並計算這兩者之間的重疊:

SELECT (startts/100)*100, ... 
from (select (case when s1.starts > t.start_ts then s1.starts else t.start_t2 end) as overlap_start, 
      (case when s1.starts+100 < t.end_ts then s1.starts+100-1 else t.end_ts end) as overlap_end, 
      ts.* 
     FROM (SELECT startts from generate_series(0,700,100) startts) s1 left outer join 
      TABLE t 
      on t.startts < s1.starts+100 and 
       t.end_ts >= s1.starts 
    ) t 
+0

**絕對驚人的**,這是它,謝謝。即使艱難,我也可以閱讀和理解查詢,但它不知如何不適合我的頭,因此我可以組成一個。 SQL在某種程度上如此神祕。 –

+0

重疊時間段很難形象化。 –

+0

真實數據似乎有一個問題。數據集很大,1-100M行,所以這種方法會太慢。 –

0

SQL Fiddle。爲了清楚起見,它顯示了每個步驟的所有計算列。

with data_avg as (
    select start_ts, end_ts, "data" * 1.0/((end_ts + 1) - start_ts) data_avg 
    from test 
), gs as (
    select start_ts, start_ts + 99 end_ts 
    from generate_series(
     (select min(start_ts) from test), 
     (select max(end_ts) from test), 
     100 
    ) gs(start_ts) 
) 
select 
    t_start, t_end, 
    gs_start, gs_end, 
    cgroup, 
    s."start", s."end", 
    da.start_ts da_start, da.end_ts da_end 
    ,round((s."end" - s."start" + 1) * da.data_avg) "data" 
from (
    select 
     t.start_ts t_start, t.end_ts t_end, 
     gs.start_ts gs_start, gs.end_ts gs_end, 
     cgroup, 
     greatest(t.start_ts, gs.start_ts) "start", least(t.end_ts, gs.end_ts) "end" 
    from 
     test t 
     inner join 
     gs on 
      gs.start_ts between t.start_ts and t.end_ts 
      or 
      gs.end_ts between t.start_ts and t.end_ts 
    ) s 
    inner join 
    data_avg da on 
     da.start_ts between t_start and t_end 
     and 
     da.end_ts between t_start and t_end 
order by s."start" 

結果:

t_start | t_end | gs_start | gs_end | cgroup | start | end | da_start | da_end | data 
---------+-------+----------+--------+--------+-------+-----+----------+--------+------ 
     0 | 179 |  0 |  99 | G1  |  0 | 99 |  0 | 179 | 1111 
     0 | 179 |  100 | 199 | G1  | 100 | 179 |  0 | 179 | 889 
    180 | 499 |  100 | 199 | G2  | 180 | 199 |  180 | 499 | 63 
    180 | 499 |  200 | 299 | G2  | 200 | 299 |  180 | 499 | 313 
    180 | 499 |  300 | 399 | G2  | 300 | 399 |  180 | 499 | 313 
    180 | 499 |  400 | 499 | G2  | 400 | 499 |  180 | 499 | 313 
    500 | 699 |  500 | 599 | G1  | 500 | 599 |  500 | 699 | 500 
    500 | 699 |  600 | 699 | G1  | 600 | 699 |  500 | 699 | 500