2017-06-18 108 views
0

我想估計我的季節性預測與實際數據的不同。我有以下數據集:計算窗框上累積產品的總和

day   real_revenue historical_coeff 
01/01/2017 100    1.1 
01/02/2017 105    0.98 
01/03/2017 109    1.05 
01/04/2017 107    1.07 
01/05/2017 90    1 
01/06/2017 120    0.95 
01/07/2017 98    0.99 

01/01/2017revenue = 100和季節性預測採取每天超過一天係數並將其應用到當前的收入。所以它預測01/02/2017的收入將會是100*1.1 = 110,在01/03/2017這是110*0.98 = 107.8等等。然後,預測的剩餘收入將成爲所有預測拍攝日的總和。例如,對於天數係數應用日期後的01/01/2017,總和將爲688.274235

對於第二天01/02/2017我們從值105開始。所以我們預測在01/03/2017上我們會有105*0.98 = 102.9,那麼對於01/04/2017我們會預測102.9*1.05 = 108.045等等。總預測剩餘收入將爲531.2557215

最後我想收到的表是這樣的:

day   forecasted_total_remaining_revenue 
01/01/2017 688.274235 
01/02/2017 531.2557 
01/03/2017 ... 
01/04/2017 ... 
01/05/2017 ... 
01/06/2017 ... 
01/07/2017 ... 

從本質上講,我需要的累積產物的總和的每一天,即a + a*b + a*b*c + a*b*c*d + ...

是否有可能在vertica或sql中編寫這樣的查詢?

+0

不應該爲'01/01結果根據所解釋的邏輯,「2017年」是「802.18129365」嗎? –

+0

如果包含最後一個係數,也可以得到802。在我的情況下,我描述了只有7天,因此不使用最後係數。 –

+0

「只有7天」的含義是什麼?這個問題沒有提到這一點。 –

回答

1

您可以使用ln()exp()獲得剩餘價值的產品:

select t.*, 
     exp(sum(ln(historical_coeff)) over (order by day desc)) as factor 
from t; 

當然,表達的是更復雜,如果historical_coeff是每一個負數或零。

然後,你可以利用這個累積和獲取所需金額的整體因素:

select t.* 
     real_revenue * sum(factor) over (order by day desc) * forecasted_total_remaining_revenue 
from (select t.*, 
      real_revenue * exp(sum(ln(historical_coeff)) over (order by day desc)) as forecasted_total_remaining_revenue 
     from t 
    ) t 
+0

您必須添加'ROWS UNBOUNDED PRECEDING',因爲當連續的行具有相同的'historical_coeff'(並且效率較低)時,默認的'RANGE'將返回錯誤的答案。 – dnoeth

+0

戈登..我不認爲這會給累計產品所需的總和。例如。你可以在2017年1月2日加入'1.1 * 0.98 * 1.05 * 1.07 * 1 * 0.95 * 0.99'的日期(01/01/2017)和'0.98 * 1.05 * 1.07 * 1 * 0.95 * 0.99'等等..但所需總和爲1.1 + 1.1 * 0.98 + 1.1 * 0.98 * 1.05 + 1.1 * 0.98 * 1.05 * 1.07 + 1.1 * 0.98 * 1.05 * 1.07 * 1 + 1.1 * 0.98 * 1.05 * 1.07 * 1 * 0.95 + 1.1 * 0.98 * 1.05 * 1.07 * 1 * 0.95 * 0.99' –

+0

'01/01/01'。 –

0

在常規的SQL(這裏顯示的語法是SQL Sever的),這可以用遞歸來完成cte(只要DBMS支持它們)。

with rownums as (select t.*,row_number() over(order by dt) as rn from tbl t) 
,cte as (select rn,dt,real_revenue,historical_coeff,cast(real_revenue*historical_coeff as decimal(38,10)) as res 
     from rownums 
     where rn=1 
     union all 
     select t.rn,t.dt,t.real_revenue,t.historical_coeff,cast(c.res*t.historical_coeff as decimal(38,10)) 
     from rownums t 
     join cte c on t.rn=c.rn+1 
     ) 
select dt,sum(res) over(order by dt desc) as forecasted_remaining_revenue 
from cte 

用於排除最後係數的邏輯不清楚。這總結了從給定日期到最後日期的所有累積產品。

Sample Demo

0

我認爲你在尋找這樣的事情(你可能需要調整的間隔天數):

SELECT 
    day, 
    SUM (frev) OVER (ORDER BY day 
     RANGE BETWEEN CURRENT ROW AND INTERVAL '5 DAYS' FOLLOWING 
    ) AS forecasted_total_remaining_revenue 
FROM (
    SELECT 
     day, 
     real_revenue * 
      EXP(SUM (LN(historical_coeff)) OVER(
       ORDER BY day 
       RANGE BETWEEN CURRENT ROW AND INTERVAL '5 DAYS' FOLLOWING 
       ) 
      ) AS frev 
    FROM 
     public.t1 
) a 
;