在SQL中執行通過日期和時間的聚合

我有一個數據集，其中包含具有2分鐘頻率的幾周觀察。我想將時間間隔從2分鐘增加到5分鐘。問題是，觀察的頻率並不總是相同的。我的意思是，理論上，每10分鐘就應該有5次觀察，但通常情況並非如此。請讓我知道我如何根據平均函數和觀測的時間和日期彙總觀測值。換言之，基於每5分鐘的聚合，而每5分鐘的時間間隔的觀察數量不相同。此外，我有日期和時間格式的時間戳。在SQL中執行通過日期和時間的聚合

實施例的數據：

1 2007-09-14 22:56:12 5.39 
2 2007-09-14 22:58:12 5.34 
3 2007-09-14 23:00:12 5.16 
4 2007-09-14 23:02:12 5.54 
5 2007-09-14 23:04:12 5.30 
6 2007-09-14 23:06:12 5.20

預期的結果：

1 2007-09-14 23:00 5.29 
2 2007-09-14 23:05 5.34

來源

2012-10-22 A.Amidi

發佈樣本數據：你有什麼，你需要什麼。把它寫成插入語句以便於測試樣本。另外，讓我們知道您使用的數據庫品牌。 – danihp

@danihp數據示例：[1 2007-09-14 22:56:12 5.39 2 2007-09-14 22:58:12 5.34 3 2007-09-14 23:00:12 5.16 4 2007-09 -14 23:02:12 5.54 5 2007-09-14 23:04:12 5.30 6 2007-09-14 23:06:12 5.20]預計結果：1 2007-09-14 23:00 5.29 2 2007-09-14 23:06 5.34，我正在使用PostgreSQL –

@aliamidi - 你真的應該在問題中提供那種信息，而不是評論。請參閱我對您提出的問題的編輯...另外，請您解釋爲什麼您的輸出是您的預期？爲什麼第二個記錄是'23：06'而不是'23：05'？預期的'5.34'從哪裏來？ – MatBailie

this question的答案很可能爲您的問題提供了很好的解決方案，並展示了將數據高效地聚合到時間窗口的方法。

從本質上講，使用avg集合體：

GROUP BY floor(extract(epoch from the_timestamp)/60/5)

來源

2012-10-22 11:16:26

到目前爲止的最簡單的選項是創建的參考表。在該表中，您存儲超過你所insterested間隔：（適應這個你自己的RDBMS的日期標記）

CREATE TABLE interval (
    start_time DATETIME, 
    cease_time DATETIME 
); 
INSERT INTO interval SELECT '2012-10-22 12:00', '2012-10-22 12:05'; 
INSERT INTO interval SELECT '2012-10-22 12:05', '2012-10-22 12:10'; 
INSERT INTO interval SELECT '2012-10-22 12:10', '2012-10-22 12:15'; 
INSERT INTO interval SELECT '2012-10-22 12:15', '2012-10-22 12:20'; 
INSERT INTO interval SELECT '2012-10-22 12:20', '2012-10-22 12:25'; 
INSERT INTO interval SELECT '2012-10-22 12:25', '2012-10-22 12:30'; 
INSERT INTO interval SELECT '2012-10-22 12:30', '2012-10-22 12:35'; 
INSERT INTO interval SELECT '2012-10-22 12:35', '2012-10-22 12:40';

然後你只需連接和分組聚集...

SELECT 
    interval.start_time, 
    AVG(observation.value) 
FROM 
    interval 
LEFT JOIN 
    observation 
    ON observation.timestamp >= interval.start_time 
    AND observation.timestamp < interval.cease_time 
GROUP BY 
    interval.start_time

注意：您只需創建並填充該間隔表一次，然後您可以重複使用它多次。

來源

2012-10-22 09:51:22 MatBailie

爲什麼使用'insert ... select'使插入複雜？一個簡單的'values'子句更直接 –

我傾向於同意@a_horse_with_no_name; 'insert ... select'很奇怪。 'VALUES（'first'，'row'），（'second'，'row'）;'list更清晰簡單。儘管如此，手動生成值的方式很奇怪，但您可以使用'generate_series'將分鐘間隔添加到基準日期。 –

編輯：我做更多的思考這一點，並意識到你不能僅僅從2分鐘到5分鐘。它不加起來。我將對此進行跟進，但是如果您有一些1分鐘的數據需要彙總，那麼下面的代碼就可以工作了！

如果數據是在「開始」的格式，你可以使用代碼此函數內，或者爲了便於接入的創建數據庫的功能：只需

CREATE OR REPLACE FUNCTION dev.beginning_datetime_floor(timestamp without time zone, 
integer) /* switch out 'dev' with your schema name */ 
RETURNS timestamp without time zone AS 
$BODY$ 
SELECT 
date_trunc('minute',timestamp with time zone 'epoch' + 
floor(extract(epoch from $1)/($2*60))*$2*60 
* interval '1 second') at time zone 'CST6CDT' /* change this to your time zone */ 
$BODY$ 
LANGUAGE sql VOLATILE;

你給它要聚集上的整數分鐘數（使用方法1，2，3，4，5，6，10，12，15，20，或30），這裏的幾個結果：

select dev.beginning_datetime_floor('2012-01-01 02:02:21',2)

='2012-01-01 02:02:00'

select dev.beginning_datetime_floor('2012-01-01 02:02:21',5)

=「2012-01-01 2時00分零零秒」

只是測試出來，加上或減去時間來處理與開頭結尾用built-in timestamp functions時間戳。

當您得到您想要的時間戳時，請根據您期望的aggregate functions（可能的平均值），結合Craig所說的和GROUP BY的時間戳。

您可以測試/與調整它：

date_trunc('minute',timestamp with time zone 'epoch' + 
floor(extract(epoch from your_datetime)/(interval_minutes*60))*interval_minutes*60 
* interval '1 second') at time zone 'CST6CDT' /* change this to your time zone */

它可能會變成你要平均時間戳 - 如果您的間隔時間是揮發性的例子。爲此，您可以製作一個類似的功能，將時間戳四捨五入而不是發言。

來源

2012-10-22 20:12:37 ideamotor

好的，所以這只是一種處理方法。我希望這能讓你思考如何爲你的分析需求轉換數據。

測試此代碼有一個先決條件。你需要有一張包含所有可能的1分鐘時間戳的表格。有很多方法可以解決這個問題，我只是使用我可用的東西，它是一張表：dim_time，每分鐘（00:01:00）到（23:59:00），另一張表包含所有可能的日期（dim_date）。當你加入這些（1 = 1）時，你會在所有可能的日子裏得到所有可能的分鐘數。

--first you need to create some functions I'll use later 
--credit to this first function goes to David Walling 
CREATE OR REPLACE FUNCTION dev.beginning_datetime_floor(timestamp without time zone, integer) 
    RETURNS timestamp without time zone AS 
$BODY$ 
SELECT 
date_trunc('minute',timestamp with time zone 'epoch' + 
    floor(extract(epoch from $1)/($2*60))*$2*60 
* interval '1 second') at time zone 'CST6CDT' 
$BODY$ 
    LANGUAGE sql VOLATILE; 

--the following function is what I described on my previous post 
CREATE OR REPLACE FUNCTION dev.round_minutes(timestamp without time zone, integer) 
    RETURNS timestamp without time zone AS 
$BODY$ 
    SELECT date_trunc('hour', $1) + cast(($2::varchar||' min') as interval) * round(date_part('minute',$1)::float/cast($2 as float)) 
$BODY$ 
    LANGUAGE sql VOLATILE; 

--let's load the data into a temp table, I added some data points. note: i got rid of the partial seconds 
SELECT cast(timestamp_original as timestamp) as timestamp_original, datapoint INTO TEMPORARY TABLE timestamps_second2 
FROM 
(
SELECT '2007-09-14 22:56:12' as timestamp_original, 0 as datapoint 
UNION 
SELECT '2007-09-14 22:58:12' as timestamp_original, 1 as datapoint 
UNION 
SELECT '2007-09-14 23:00:12' as timestamp_original, 10 as datapoint 
UNION 
SELECT '2007-09-14 23:02:12' as timestamp_original, 100 as datapoint 
UNION 
SELECT '2007-09-14 23:04:12' as timestamp_original, 1000 as datapoint 
UNION 
SELECT '2007-09-14 23:06:12' as timestamp_original, 10000 as datapoint 
) as data 

--this is the bit of code you'll have to replace with your implementation of getting all possible minutes 
--you could make some sequence of timestamps in R, or simply make the timestamps in Excel to test out the rest of the code 
--the result of the query is simply '2007-09-14 00:00:00' through '2007-09-14 23:59:00' 
SELECT * INTO TEMPORARY TABLE possible_timestamps 
FROM 
(
select the_date + beginning_minute as minute_timestamp 
FROM datawarehouse.dim_date as dim_date 
JOIN datawarehouse.dim_time as dim_time 
ON 1=1 
where dim_date.the_date = '2007-09-14' 
group by the_date, beginning_minute 
order by the_date, beginning_minute 
) as data 

--round to nearest minute (be sure to think about how this might change your results 
SELECT * INTO TEMPORARY TABLE rounded_timestamps2 
FROM 
(
SELECT dev.round_minutes(timestamp_original,1) as minute_timestamp_rounded, datapoint 
from timestamps_second2 
) as data 

--let's join what minutes we have data for versus the possible minutes 
--I used some subqueries so when you select all from the table you'll see the important part (not needed) 
SELECT * INTO TEMPORARY TABLE joined_with_possibles 
FROM 
(
SELECT * 
FROM 
(
SELECT *, (MIN(minute_timestamp_rounded) OVER()) as min_time, (MAX(minute_timestamp_rounded) OVER()) as max_time 
FROM possible_timestamps as t1 
LEFT JOIN rounded_timestamps2 as t2 
ON t1.minute_timestamp = t2.minute_timestamp_rounded 
ORDER BY t1.minute_timestamp asc 
) as inner_query 
WHERE minute_timestamp >= min_time 
AND minute_timestamp <= max_time 
) as data 

--here's the tricky part that might not suit your needs, but it's one method 
--if it's missing a value it grabs the previous value 
--if it's missing the prior value it grabs the one before that, otherwise it's null 
--best practice would be run another case statement with 0,1,2 specifying which point was pulled, then you can count those when you aggregate 
SELECT * INTO TEMPORARY TABLE shifted_values 
FROM 
(
SELECT 
*, 
case 
when datapoint is not null then datapoint 
when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is not null 
    then lag(datapoint,1) over (order by minute_timestamp asc) 
when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is null and (lag(datapoint,2) over (order by minute_timestamp asc)) is not null 
    then lag(datapoint,2) over (order by minute_timestamp asc) 
else null end as last_good_value 
from joined_with_possibles 
ORDER BY minute_timestamp asc 
) as data 

--now we use the function from my previous post to make the timestamps to aggregate on 
SELECT * INTO TEMPORARY TABLE shifted_values_with_five_minute 
FROM 
(
SELECT *, dev.beginning_datetime_floor(minute_timestamp,5) as five_minute_timestamp 
FROM shifted_values 
) as data 

--finally we aggregate 
SELECT 
AVG(datapoint) as avg_datapoint, five_minute_timestamp 
FROM shifted_values_with_five_minute 
GROUP BY five_minute_timestamp

來源

2012-10-22 21:27:46 ideamotor

在SQL中執行通過日期和時間的聚合

回答

相關問題