2014-06-06 18 views
4

Hadoop 2中有一項新功能,名爲超大型。例如,this reference說:如何在Hadoop2中指定Hive查詢的大小化?

Uberization是運行MapReduce工作的各項任務 的ApplicationMaster的JVM如果作業是足夠小的可能性。通過這種方式,您可以避免從ResourceManager請求容器 並要求NodeManagers啓動(據稱很小)的任務。

我無法分辨的是這是神奇的發生在幕後還是需要爲此發生?例如,在做Hive查詢時,是否有一個設置(或提示)來實現此目的?你能指定什麼是「足夠小」的門檻?

此外,我很難找到關於這個概念的很多內容 - 它是否以另一個名字出現?

回答

4

我發現關於「超級工作」由阿倫穆爾蒂在YARN Book細節:當多個映射器和減速器組合使用單一 容器

的尤伯杯工作發生。關於在表9.3中提供的012reddisplay-site.xml選項 中找到的優步作業配置有四項核心設置。

這裏是表9.3:

|-----------------------------------+------------------------------------------------------------| 
| mapreduce.job.ubertask.enable  | Whether to enable the small-jobs "ubertask" optimization, | 
|         | which runs "sufficiently small" jobs sequentially within a | 
|         | single JVM. "Small" is defined by the maxmaps, maxreduces, | 
|         | and maxbytes settings. Users may override this value.  | 
|         | Default = false.           | 
|-----------------------------------+------------------------------------------------------------| 
| mapreduce.job.ubertask.maxmaps | Threshold for the number of maps beyond which the job is | 
|         | considered too big for the ubertasking optimization.  | 
|         | Users may override this value, but only downward.   | 
|         | Default = 9.            | 
|-----------------------------------+------------------------------------------------------------| 
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which   | 
|         | the job is considered too big for the ubertasking   | 
|         | optimization. Currently the code cannot support more  | 
|         | than one reduce and will ignore larger values. (Zero is | 
|         | a valid maximum, however.) Users may override this   | 
|         | value, but only downward.         | 
|         | Default = 1.            | 
|-----------------------------------+------------------------------------------------------------| 
| mapreduce.job.ubertask.maxbytes | Threshold for the number of input bytes beyond    | 
|         | which the job is considered too big for the uber-   | 
|         | tasking optimization. If no value is specified,   | 
|         | `dfs.block.size` is used as a default. Be sure to   | 
|         | specify a default value in `mapred-site.xml` if the  | 
|         | underlying file system is not HDFS. Users may override  | 
|         | this value, but only downward.        | 
|         | Default = HDFS block size.         | 
|-----------------------------------+------------------------------------------------------------| 

我還不知道是否有一個蜂房特定的方式來設置這個或者如果你只是使用上面蜂巢。

1

將多個映射器和縮減器組合起來以在應用程序主模塊內執行時,將出現優步作業。所以假設,要執行的工作有MAX Mappers < = 9; MAX Reducers < = 1,那麼資源管理器(RM)將創建一個應用程序主文件並使用其自己的JVM在應用程序主文件中很好地執行作業。

SET mapreduce.job.ubertask.enable = TRUE;

因此,使用Uberised工作的好處是,往返開銷的應用程序的主執行,通過詢問集裝箱作業,從資源管理器(RM)和RM分配容器應用程序主被消除。