2015-03-25 88 views
0

我有以下架構HIVE表像這樣的數組:蜂巢:加入到字符串列

hive>desc books; 
gen_id     int           
author     array<string>        
rating     double        
genres     array<string> 

hive>select * from books; 

| gen_id   | rating | author   |genres 
+----------------+-------------+---------------+---------- 
| 1    | 10  | ["A","B"]  | ["X","Y"] 
| 2    | 20  | ["C","A"]  | ["Z","X"] 
| 3    | 30  | ["D"]   | ["X"] 

是否有一個地方,我可以執行一些SELECT操作查詢並返回單個行,像這樣:

| gen_id  | rating  | JoinData 
+-------------+---------------+------------- 
| 1   | 10   | ["A","B","X","Y"] 
| 2   | 20   | ["C","A","Z","X"] 
| 3   | 30   | ["D","X"] 
| 1   | 10   | "Y" 

有人可以指導我怎麼能得到這個結果。預先感謝任何幫助。

回答

1

答案就在這個帖子:
[1]:http://stackoverflow.com/questions/21578477/array-intersect-hive

對於人來說,不希望進入線程:

1)使用UDF創建臨時函數 CREATE TEMPORARY FUNCTION結合AS'brickhouse.udf.collect.CombineUDF';

2)做一個select語句

select gen_id 
    , rating 
    , combine(author, genres) as JoinData 
from books