2015-06-20 162 views
0

我正在嘗試做一個簡單的蜂巢轉換。蜂巢轉換

Simple Hive Transformation

可有一個人給我提供一個方法來做到這一點?我試過collect_set,目前正在查看klout的開源UDF。

+0

可以有相同的單位以後例如ABC可以在日期時間8開始,然後在日期時間9開始。我們需要保持時間單位連續。 Fyi,一個簡單的groupby會做出這個不正確的模式。 –

回答

0

使用min和max函數怎麼樣?我認爲有以下會得到你所需要的:

SELECT 
    Unit, 
    MIN(datetime) as start, 
    MAX(datetime) as stop 
from table_name 
group by Unit 
; 
+0

感謝您的回覆。這不是那麼簡單。想象一下不同的日期和時區。不止一次訪問同一單位。 –

1

我覺得這給你想要的東西。我無法運行它並進行調試。祝你好運!

select start_point.unit 
    , start_time as start 
    , start_time + min(stop_time - start_time) as stop 
from 
    (select * from 
     (select date_time as start_time 
     , unit 
     , last_value(unit) over (order by date_time row desc between current row and 1 following) as previous_unit 
     from table 
    ) previous 
     where unit <> previous_unit 
) start_points 
left outer join 
    (select * from 
     (select date_time as stop_time 
     , unit 
     , last_value(unit) over (order by date_time row between current row and 1 following) as next_unit 
     from table 
    ) next 
     where unit <> next_unit 
) stop_points 
on start_points.unit = stop_points.unit 
where stop_time > start_time 
group by start_point.unit, start_time 
; 
+0

感謝您使用窗口函數的指針。不是一個確切的解決方案,而是正確的道路。 –

0

我發現了。感謝您的指針使用窗函數

select * 
from 
(select *, 
case when lag(unit,1) over (partition by id order by effective_time_ut desc) is NULL THEN 1 
when unit<>lag(unit,1) over (partition by id order by effective_time_ut desc) then 1 
when lead(unit,1) over (partition by id order by effective_time_ut desc) is NULL then 1 
else 0 end as different_loc 
from units_we_care) a 
where different_loc=1 
0
create table temptable as select unit, start_date, end_time, row_number() over() as row_num from (select unit, min(date_time) start_date, max(date_time) as end_time from table group by unit) a; 

select a.unit, a.start_date as start_date, nvl(b.start_date, a.end_time) end_time from temptable a left outer join temptable b on (a.row_num+1) = b.row_num;