2017-03-09 22 views
3

我有以下類型的表:自與條件接合表

表dummy1:

e_n t_s item 
a  t1 c 
a  t2 c 
a  t3 c 
a  t4 c 
b  p1 c 
b  p2 c 
b  p3 c 
b  p4 c 

T1,T2,T3,T4,P1,P2,P3,P4是時間戳在升序。 t1,t2,t3,t4是event_name'a'的升序時間戳。 p1,p2,p3,p4是event_name'b'升序的時間戳。

c是發生這些事件'a'和'b'的item_number。

我試圖寫它的結果應該是作爲查詢如下:

e_n1 e_n2 item t_s_1 t_s_2 
a  b  c  t1 p1 
a  b  c  t2 p2 
a  b  c  t3 p3 
a  b  c  t4 p4 

我曾嘗試下面的代碼:

select l.e_n as e_n_1, m.e_n as e_n_2, l.item, l.t_s as t_s_a, 
m.t_s as t_s_b from (
(select * from dummy where e_n = 'a') l 
join 
(select * from dummy where e_n = 'b') m 
on l.item = m.item and l.t_s < m.t_s 

的加入l.item = m.item需要,因爲有許多其他項目C1,C2,C3具有相同的結構

結果是:

e_n1 e_n2 item t_s_a t_s_b 
    a  b  c  t1 p1 
    a  b  c  t1 p2 
    a  b  c  t1 p3 
    a  b  c  t1 p4 
    a  b  c  t2 p1 
    a  b  c  t2 p2 
    a  b  c  t2 p3 

so on 

我如何以高效的方式實現我的結果?

+0

是你的apache-spark-sql支持ROW_NUMBER()OVER(ORDER BY t_s)rn?如果是,那麼簡單地使用'l.rn = m.rn'完全外部連接表'l'和'm' –

+0

這是專門針對Amazon Redshift的嗎?還是Spark?您能否相應地澄清您的標籤? –

+0

這是爲apache-spark-sql – SpaceOddity

回答

3
select  min (case when e_n = 'a' then 'a' end) as e_n1 
      ,min (case when e_n = 'b' then 'b' end) as e_n2 
      ,item 
      ,min (case when e_n = 'a' then t_s end) as t_s_1 
      ,min (case when e_n = 'b' then t_s end) as t_s_2 

from  (select  d.* 
         ,row_number() over (partition by item,e_n order by t_s) as rn 

      from  dummy as d 
      ) d 

group by item 
      ,rn 

+------+------+------+-------+-------+ 
| e_n1 | e_n2 | item | t_s_1 | t_s_2 | 
+------+------+------+-------+-------+ 
| a | b | c | t1 | p1 | 
| a | b | c | t2 | p2 | 
| a | b | c | t3 | p3 | 
| a | b | c | t4 | p4 | 
+------+------+------+-------+-------+ 
+0

一種親切的提醒來接受答案(通過標記** V **標記留給它) –

0

首先,排序時間戳每一個事件,然後加入對排序表中的行數。

請嘗試下面的代碼。

select l.e_n as e_n_1, m.e_n as e_n_2, isnull(l.item,m.item) as item, l.t_s as t_s_a, 
    m.t_s as t_s_b from 
    (select *,(row_number() over (order by t_s)) as rn from dummy where e_n = 'a') l 
    full join 
    (select *,(row_number() over (order by t_s)) as rn from dummy where e_n = 'b') m 
    on l.item = m.item and l.rn=m.rn