2017-08-15 32 views
0

我在Impala中有一個表格,其中包含以下字段:廣告系列ID,帳戶,開始日期,結束日期,交易日期和收入。有多個廣告系列具有相同的帳戶和收入值。我想要將這些廣告系列之間的收入值劃分爲[Transaction_Date,Transaction_Date + 36個月]範圍內的廣告系列。
樣品表:在日期範圍內按行數劃分列值

Campaign | Account | Start Date | End Date | Trans. Date | Revenue 
     1  | 1234 | 13-05-17 | 13-06-17 | 19-10-17 | 200 
     2  | 1234 | 14-01-16 | 14-02-16 | 19-10-17 | 200 
     2  | 5678 | 14-01-16 | 14-02-16 | 07-02-16 | 200 
     3  | 2345 | 20-05-15 | 20-07-15 | 22-05-15 | 300 
     4  | 1234 | 15-10-13 | 15-11-13 | 19-10-17 | 200 
     4  | 5678 | 15-10-13 | 15-11-13 | 22-05-15 | 300 

這裏,賬戶1234的收入應該運動1和2之間進行分割,而不是4自交易日下降36個月競選開始之後。雖然帳戶2345的收入應運動2和4
之間拆分所以結果表應爲:

Campaign | Account | Start Date | End Date | Trans. Date | Revenue | Avg Revenue 
     1  | 1234 | 13-05-17 | 13-06-17 | 19-10-17 | 200  | 100 
     2  | 1234 | 14-01-16 | 14-02-16 | 19-10-17 | 200  | 100 
     2  | 5678 | 14-01-16 | 14-02-16 | 07-02-16 | 200  | 200 
     3  | 2345 | 20-05-15 | 20-07-15 | 22-05-15 | 300  | 150 
     4  | 1234 | 15-10-13 | 15-11-13 | 19-10-17 | 200  | NULL 
     4  | 2345 | 15-10-13 | 15-11-13 | 22-05-15 | 300  | 150 

編輯:
從本質上講,我要做到以下幾點:
1。每行,獲取trans_date在開始日期和開始日期+ 3年之間的帳戶的所有行。
2.將每個行中的收入除以行數。
我試圖使用分區進行這項工作,但我不確定如何根據日期值創建一個具有可變範圍的分區。
希望這可以讓它更清晰。
謝謝!

+0

哪個RDBMS?你有什麼嘗試?幫助我們幫助你。 – CGritton

+0

對不起。我正在使用Impala .. 我已經嘗試過使用分區查詢,但我不確定如何去處理各種日期範圍.. –

回答

0

這將在甲骨文的工作,這個概念應該能夠適應的Postgres ..

drop table test; 

create table test as 
select 1 as Campaign, 1234 as Account, to_date('13-05-17', 'DD-MM-YY') as Start_Date, to_date('13-06-17', 'DD-MM-YY') as End_Date, to_date('19-10-17', 'DD-MM-YY') as Trans_Date, 200 as Revenue from dual union all 
select 2 as Campaign, 1234 as Account, to_date('14-01-16', 'DD-MM-YY') as Start_Date, to_date('14-02-16', 'DD-MM-YY') as End_Date, to_date('19-10-17', 'DD-MM-YY') as Trans_Date, 200 as Revenue from dual union all 
select 2 as Campaign, 5678 as Account, to_date('14-01-16', 'DD-MM-YY') as Start_Date, to_date('14-02-16', 'DD-MM-YY') as End_Date, to_date('07-02-16', 'DD-MM-YY') as Trans_Date, 200 as Revenue from dual union all 
select 3 as Campaign, 2345 as Account, to_date('20-05-15', 'DD-MM-YY') as Start_Date, to_date('20-07-15', 'DD-MM-YY') as End_Date, to_date('22-05-15', 'DD-MM-YY') as Trans_Date, 300 as Revenue from dual union all 
select 4 as Campaign, 1234 as Account, to_date('15-10-13', 'DD-MM-YY') as Start_Date, to_date('15-11-13', 'DD-MM-YY') as End_Date, to_date('19-10-17', 'DD-MM-YY') as Trans_Date, 200 as Revenue from dual union all 
select 4 as Campaign, 2345 as Account, to_date('15-10-13', 'DD-MM-YY') as Start_Date, to_date('15-11-13', 'DD-MM-YY') as End_Date, to_date('22-05-15', 'DD-MM-YY') as Trans_Date, 300 as Revenue from dual 
; 

select 
    a.* 
    ,case when Start_Date + (365 * 3) > Trans_Date then Revenue else null end/count(case when Start_Date + (365 * 3) > Trans_Date then 1 else null end) over (partition by account) as Avg_Revenue 
from test a 
order by Campaign, Account 
+0

嗨,謝謝。這與使用where子句作爲 「select a。*,revenue/count(campaign)over(by account by account)as avg_revenue from test a'」有何不同? –

+0

A where子句將從結果中排除示例數據中的第五行。我的方法包含它,但是平均收入爲零。 –

+0

噢好吧..謝謝。讓我檢查結果,並讓你知道。 編輯:剛纔意識到我沒有在以前的評論中添加完整的查詢。只是爲了記錄,它是: 'select a。*,revenue/count(campaign)over(by account by account)as avg_revenue from test a where start_date +(365 * 3)> trans_date' –