2015-09-03 51 views
0

我想弄清楚在Hive中從平面源選擇數據並輸出到命名結構數組中的方法。下面是我所期待的......Hive選擇數據到一個結構數組

樣品數據例如:

house_id,first_name,last_name 
1,bob,jones 
1,jenny,jones 
2,sally,johnson 
3,john,smith 
3,barb,smith 

所需的輸出:

1 [{"first_name":"bob","last_name":"jones"},{"first_name":"jenny","last_name":"jones"}] 
2 [{"first_name":"sally","last_name":"johnson"}] 
3 [{"first_name":"john","last_name":"smith"},{"first_name":"barb","last_name":"smith"}] 

我試圖collect_list和collect_set,但他們只允許基本數據類型。有關我如何在Hive中實現這一點的任何想法?

回答

4

我會使用這個jar,這是collect(並採取複雜的數據類型)更好的實現。

查詢

add jar /path/to/jar/brickhouse-0.7.1.jar; 
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF'; 

select house_id 
    , collect(named_struct("first_name", first_name, "last_name", last_name)) 
from db.table 
group by house_id 

輸出

1 [{"first_name":"bob","last_name":"jones"}, {"first_name":"jenny","last_name":"jones"}] 
2 [{"first_name":"sally","last_name":"johnson"}] 
3 [{"first_name":"john","last_name":"smith"},{"first_name":"barb","last_name":"smith"}] 
+0

完美!按說明運作。 – Cymon

+0

有沒有辦法做到這一點,而不必明確聲明named_struct?例如:collect(*) – samol

0

您還可以使用一種變通方法

select collect_list(full_name) full_name_list from (
    select 
     concat_ws(',', 
      concat("first_name:",first_name), 
      concat("last_name:",last_name) 
      ) full_name, 
     house_id 
    from house) a 
group by house_id