2015-06-10 111 views
0

最近我一直在玩Hive。大多數事情已經進展順利,但是,當我嘗試一些轉換像Hive JOIN查詢子查詢需要永久

2015-04-01 device1 traffic other  start 
2015-04-01 device1 traffic violation deny 
2015-04-01 device1 traffic violation deny 
2015-04-02 device1 traffic other  start 
2015-04-03 device1 traffic other  start 
2015-04-03 device1 traffic other  start 

進入

2015-04-01 1  2 
2015-04-02 1  
2015-04-03 2  

我使用下面的查詢嘗試,但由於某些原因,降低了查詢只是被卡住的階段無論等待多長時間,我都能達到96%。

SELECT pass.date, COUNT(pass.type), COUNT(deny.deny_type) FROM firewall_logs as pass 
JOIN (
SELECT date, type as deny_type FROM firewall_logs 
WHERE device = 'device1' 
AND date LIKE '2015-04-%' 
AND type = 'traffic' AND subtype = 'violation' and status = 'deny' 
) deny ON (pass.date = deny.date ) 
WHERE pass.device = 'device1' 
AND pass.date LIKE '2015-04-%' 
AND pass.type = 'traffic' AND pass.subtype = 'other' AND pass.status = 'start' 
GROUP BY pass.date ORDER BY pass.date ; 

所有MR2日誌顯示的是:

2015-06-11 01:54:04,206 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 9028000 rows for join key [2015-04-26] 
2015-06-11 01:54:04,423 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 9128000 rows for join key [2015-04-26] 
2015-06-11 01:54:04,638 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 9228000 rows for join key [2015-04-26] 
2015-06-11 01:54:04,838 INFO [main] org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 

會有人有原因的想法?

回答

1

我儘量避免像瘟疫一樣在Hive中自我加入。您可以通過收集並創建地圖來做到這一點

add jar ./brickhouse-0.7.1.jar; 
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF'; 

select date 
    , c_map['start'] starts 
    , c_map['deny'] denies 
from (
    select date 
    , collect(status, c) c_map 
    from (
    select date, status 
     , count(subtype) c 
    from table 
    where device='device1' and type='traffic' 
    group by date, status) x 
    group by date) y