2012-01-15 53 views
6

我是新來的蜂房,我曾經遇到過一個問題,如何讓Hive同時運行mapreduce作業?

我有一個表在蜂巢這樣的:

create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int, 
v4 int, v5 bigint, v6 int) PARTITIONED BY(dt STRING) 
ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ',' lines TERMINATED BY '\n' ; 

我運行SQL這樣的:

from td 
INSERT OVERWRITE DIRECTORY '/tmp/total.out' select count(v1) 
INSERT OVERWRITE DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1) 
INSERT OVERWRITE DIRECTORY '/tmp/distinctuin.out' select distinct v1 

INSERT OVERWRITE DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4 
INSERT OVERWRITE DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4 

INSERT OVERWRITE DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1) where v4=2 or v4=6 
INSERT OVERWRITE DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3 

INSERT OVERWRITE DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1) where v4=1 or v4=5 
INSERT OVERWRITE DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3 

它工作,並輸出結果是我想要的。

但存在一個問題,配置單元會生成9個mapreduce作業並逐個運行這些作業。

我運行此查詢說明,並且我得到了以下信息:

STAGE DEPENDENCIES: 
    Stage-9 is a root stage 
    Stage-0 depends on stages: Stage-9 
    Stage-10 depends on stages: Stage-9 
    Stage-1 depends on stages: Stage-10 
    Stage-11 depends on stages: Stage-9 
    Stage-2 depends on stages: Stage-11 
    Stage-12 depends on stages: Stage-9 
    Stage-3 depends on stages: Stage-12 
    Stage-13 depends on stages: Stage-9 
    Stage-4 depends on stages: Stage-13 
    Stage-14 depends on stages: Stage-9 
    Stage-5 depends on stages: Stage-14 
    Stage-15 depends on stages: Stage-9 
    Stage-6 depends on stages: Stage-15 
    Stage-16 depends on stages: Stage-9 
    Stage-7 depends on stages: Stage-16 
    Stage-17 depends on stages: Stage-9 
    Stage-8 depends on stages: Stage-17 

似乎階段9-17對應於MapReduce工作0-8
但從解釋的消息之上,舞臺10-17只取決於第9階段,
所以我有一個問題,爲什麼工作1-8不能同時運行?

或者我該如何讓工作1-8同時運行?

非常感謝您的幫助!

回答

5

在hive-default.xml中,有一個名爲「hive.exec.parallel」的屬性,它可以啓用並行執行作業。默認值是「false」。你可以改變它爲「真」來獲得這個能力。您可以使用另一個屬性「hive.exec.parallel.thread.number」來控制最多可以並行執行多少個作業。

有關詳細信息:https://issues.apache.org/jira/browse/HIVE-549

+0

這個作品!非常感謝你! – SSolid 2012-01-18 07:19:26

+0

@kai zhang我明白,當「hive.exec.parallel」設置爲true時,獨立任務將並行運行。你能想到任何需要將其設置爲false的用例嗎? – 2013-08-13 17:40:20

+0

@MayankJaiswal根據我的知識,「hive.exec.parallel」在很早的版本(例如0.7)中被建議爲「false」。我認爲唯一的原因是當時該功能不夠穩定。 – 2013-09-05 11:10:20

相關問題