2015-08-27 78 views
1

我有一個日期分區的Hive表,每個用戶都有一行。它有一個activity_log列,其值爲1或0,具體取決於用戶是否在該日期執行了該活動。Hive QL - 根據條件選擇,分組在一行上

我也有一個UDF,像dayOfWeek(),給了我一個星期給定日期。

我試圖創建一個表,其中包含過去一週的用戶活動。因此,列將是:

user, activity_log_mon, activity_log_tue, activity_log_wed, ...activity_log_sun 

每個activity_log列的值應爲1或0,指示用戶是否或不執行的過去一週的那一天的活動。

這是一個查詢,讓我幾乎什麼,我想:

SELECT user, 
IF(dayOfWeek(date)='sun', activity_log , NULL) as activity_log_sun, 
IF(dayOfWeek(date)='mon', activity_log , NULL) as activity_log_mon, 
IF(dayOfWeek(date)='tue', activity_log , NULL) as activity_log_tue, 
IF(dayOfWeek(date)='wed', activity_log , NULL) as activity_log_wed, 
IF(dayOfWeek(date)='thu', activity_log , NULL) as activity_log_thu, 
IF(dayOfWeek(date)='fri', activity_log , NULL) as activity_log_fri, 
IF(dayOfWeek(date)='sat', activity_log , NULL) as activity_log_sat 
FROM user_activity_table 
WHERE date >= '2015-08-18' AND date <= '2015-08-24' 

但是這給每個用戶的7行,如下所示:

user activity_log_sun activity_log_mon .... activity_log_sat 

abcd   1     NULL      NULL 
abcd   NULL     0      NULL 
... 
abcd   NULL    NULL      1 

我真正想要的是僅具有桌子每行用戶如下:

user activity_log_sun activity_log_mon .... activity_log_sat 

abcd   1     0       1 

我該如何重組這些行?或者,首先獲得這樣的行的最佳方式是什麼?

回答

0

遵守以下HiveQL的行爲:

SELECT COALESCE(collected[0], collected[1], collected[2], collected[3]) 
FROM(Select Array(NULL, 1, NULL, NULL) as collected) a; 

這將返回1作爲第一個非空值到COALESCE功能。然後看到有一個分組功能collect_list(col)

所以,如果我們所說的每用戶7行作爲activity_uncollected你的輸出,您的最終轉型將是:

SELECT user_id, 
    COALESCE(collected_mon[0], collected_mon[1], ..., collected_mon[6]), 
    ... 
    COALESCE(collected_sun[0], collected_sun[1], ..., collected_sun[6]) 
FROM 
    (SELECT user_id, 
    collect_list(activity_log_mon), 
    ..., 
    collect_list(activity_log_sun) 
    FROM activity_uncollected 
    GROUP BY user_id) a; 

這組中的所有值每用戶,每一天,然後挑選非每個數組的空值。

+0

我想使用COALESCE,但我無法弄清楚如何。我實際上最終做的是總結'activity_uncollected',GROUPed BY用戶的列。 – ubuntunoob

1

這裏是我落得這樣做:

SELECT user, 
     SUM(activity_log_sun), 
     SUM(activity_log_mon), 
     SUM(activity_log_tue), 
     SUM(activity_log_wed), 
     SUM(activity_log_thu), 
     SUM(activity_log_fri), 
     SUM(activity_log_sat) 
FROM ( 
SELECT user, 
IF(dayOfWeek(date)='sun', activity_log , NULL) as activity_log_sun, 
IF(dayOfWeek(date)='mon', activity_log , NULL) as activity_log_mon, 
IF(dayOfWeek(date)='tue', activity_log , NULL) as activity_log_tue, 
IF(dayOfWeek(date)='wed', activity_log , NULL) as activity_log_wed, 
IF(dayOfWeek(date)='thu', activity_log , NULL) as activity_log_thu, 
IF(dayOfWeek(date)='fri', activity_log , NULL) as activity_log_fri, 
IF(dayOfWeek(date)='sat', activity_log , NULL) as activity_log_sat 
FROM user_activity_table 
WHERE date >= '2015-08-18' AND date <= '2015-08-24' 
) t 
GROUP BY user