好的,所以這只是一種處理方法。我希望這能讓你思考如何爲你的分析需求轉換數據。
測試此代碼有一個先決條件。你需要有一張包含所有可能的1分鐘時間戳的表格。有很多方法可以解決這個問題,我只是使用我可用的東西,它是一張表:dim_time,每分鐘(00:01:00)到(23:59:00),另一張表包含所有可能的日期(dim_date)。當你加入這些(1 = 1)時,你會在所有可能的日子裏得到所有可能的分鐘數。
--first you need to create some functions I'll use later
--credit to this first function goes to David Walling
CREATE OR REPLACE FUNCTION dev.beginning_datetime_floor(timestamp without time zone, integer)
RETURNS timestamp without time zone AS
$BODY$
SELECT
date_trunc('minute',timestamp with time zone 'epoch' +
floor(extract(epoch from $1)/($2*60))*$2*60
* interval '1 second') at time zone 'CST6CDT'
$BODY$
LANGUAGE sql VOLATILE;
--the following function is what I described on my previous post
CREATE OR REPLACE FUNCTION dev.round_minutes(timestamp without time zone, integer)
RETURNS timestamp without time zone AS
$BODY$
SELECT date_trunc('hour', $1) + cast(($2::varchar||' min') as interval) * round(date_part('minute',$1)::float/cast($2 as float))
$BODY$
LANGUAGE sql VOLATILE;
--let's load the data into a temp table, I added some data points. note: i got rid of the partial seconds
SELECT cast(timestamp_original as timestamp) as timestamp_original, datapoint INTO TEMPORARY TABLE timestamps_second2
FROM
(
SELECT '2007-09-14 22:56:12' as timestamp_original, 0 as datapoint
UNION
SELECT '2007-09-14 22:58:12' as timestamp_original, 1 as datapoint
UNION
SELECT '2007-09-14 23:00:12' as timestamp_original, 10 as datapoint
UNION
SELECT '2007-09-14 23:02:12' as timestamp_original, 100 as datapoint
UNION
SELECT '2007-09-14 23:04:12' as timestamp_original, 1000 as datapoint
UNION
SELECT '2007-09-14 23:06:12' as timestamp_original, 10000 as datapoint
) as data
--this is the bit of code you'll have to replace with your implementation of getting all possible minutes
--you could make some sequence of timestamps in R, or simply make the timestamps in Excel to test out the rest of the code
--the result of the query is simply '2007-09-14 00:00:00' through '2007-09-14 23:59:00'
SELECT * INTO TEMPORARY TABLE possible_timestamps
FROM
(
select the_date + beginning_minute as minute_timestamp
FROM datawarehouse.dim_date as dim_date
JOIN datawarehouse.dim_time as dim_time
ON 1=1
where dim_date.the_date = '2007-09-14'
group by the_date, beginning_minute
order by the_date, beginning_minute
) as data
--round to nearest minute (be sure to think about how this might change your results
SELECT * INTO TEMPORARY TABLE rounded_timestamps2
FROM
(
SELECT dev.round_minutes(timestamp_original,1) as minute_timestamp_rounded, datapoint
from timestamps_second2
) as data
--let's join what minutes we have data for versus the possible minutes
--I used some subqueries so when you select all from the table you'll see the important part (not needed)
SELECT * INTO TEMPORARY TABLE joined_with_possibles
FROM
(
SELECT *
FROM
(
SELECT *, (MIN(minute_timestamp_rounded) OVER()) as min_time, (MAX(minute_timestamp_rounded) OVER()) as max_time
FROM possible_timestamps as t1
LEFT JOIN rounded_timestamps2 as t2
ON t1.minute_timestamp = t2.minute_timestamp_rounded
ORDER BY t1.minute_timestamp asc
) as inner_query
WHERE minute_timestamp >= min_time
AND minute_timestamp <= max_time
) as data
--here's the tricky part that might not suit your needs, but it's one method
--if it's missing a value it grabs the previous value
--if it's missing the prior value it grabs the one before that, otherwise it's null
--best practice would be run another case statement with 0,1,2 specifying which point was pulled, then you can count those when you aggregate
SELECT * INTO TEMPORARY TABLE shifted_values
FROM
(
SELECT
*,
case
when datapoint is not null then datapoint
when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is not null
then lag(datapoint,1) over (order by minute_timestamp asc)
when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is null and (lag(datapoint,2) over (order by minute_timestamp asc)) is not null
then lag(datapoint,2) over (order by minute_timestamp asc)
else null end as last_good_value
from joined_with_possibles
ORDER BY minute_timestamp asc
) as data
--now we use the function from my previous post to make the timestamps to aggregate on
SELECT * INTO TEMPORARY TABLE shifted_values_with_five_minute
FROM
(
SELECT *, dev.beginning_datetime_floor(minute_timestamp,5) as five_minute_timestamp
FROM shifted_values
) as data
--finally we aggregate
SELECT
AVG(datapoint) as avg_datapoint, five_minute_timestamp
FROM shifted_values_with_five_minute
GROUP BY five_minute_timestamp
發佈樣本數據:你有什麼,你需要什麼。把它寫成插入語句以便於測試樣本。另外,讓我們知道您使用的數據庫品牌。 – danihp
@danihp數據示例:[1 2007-09-14 22:56:12 5.39 2 2007-09-14 22:58:12 5.34 3 2007-09-14 23:00:12 5.16 4 2007-09 -14 23:02:12 5.54 5 2007-09-14 23:04:12 5.30 6 2007-09-14 23:06:12 5.20]預計結果:1 2007-09-14 23:00 5.29 2 2007-09-14 23:06 5.34,我正在使用PostgreSQL –
@aliamidi - 你真的應該在問題中提供那種信息,而不是評論。請參閱我對您提出的問題的編輯...另外,請您解釋爲什麼您的輸出是您的預期?爲什麼第二個記錄是'23:06'而不是'23:05'?預期的'5.34'從哪裏來? – MatBailie