2016-06-13 31 views
0

我跑的查詢使用和不使用SMB連接,得到了不同的結果。請幫忙解釋一下。蜂巢排序合併桶地圖(SMB地圖)加入

SET hive.enforce.bucketing=true; 

create table dbaproceduresbuckets (
owner   string , 
object_name  string , 
procedure_name string , 
object_id  double , 
subprogram_id double , 
overload  string , 
object_type  string , 
aggregate  string , 
pipelined  string , 
impltypeowner string , 
impltypename string , 
parallel  string , 
interface  string , 
deterministic string , 
authid   string) 
CLUSTERED BY (object_id) SORTED BY (OBJECT_ID ASC) INTO 32 BUCKETS; 

CREATE TABLE dbaobjectsbuckets1(
owner   string, 
object_name  string, 
subobject_name string, 
object_id  double, 
data_object_id double, 
object_type  string, 
created   string, 
last_ddl_time string, 
timestamp  string, 
status   string, 
temporary  string, 
generated  string, 
secondary  string, 
namespace  double, 
edition_name  string) CLUSTERED BY (object_id) SORTED BY (OBJECT_ID ASC) INTO 32 BUCKETS; 

**** load the table; 

0:JDBC:hive2:// XXXXXX:從dbaobjectsbuckets1 10000> SELECT COUNT(*),dbaproceduresbuckets b 0:JDBC:hive2:// XXXXXXXX:10000>其中a.object_id = B。 OBJECT_ID; 信息:Stage-2的Hadoop作業信息:mappers的數量:3;減速器的數目:1 INFO:2016年6月13日15:56:00381階段-2地圖= 0%,減少= 0% INFO:2016年6月13日15:56:55818階段-2地圖= 1% ,減少= 0%,累積CPU 122.6秒 INFO:2016年6月13日15:57:47124階段-2地圖= 7%,減少= 0%,累積CPU 326.86秒 ......... 信息:2016-06-13 16:05:01,246階段2映射= 100%,減少= 100%,累積CPU 867.1秒 信息:MapReduce總累計CPU時間:14分27秒100毫秒 信息:已結束工作= job_1464280256859_0146 + -------- + - + | _c0 | + -------- + - + | 54876 | + -------- + - +

**** 
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin=true; 
set hive.optimize.bucketmapjoin.sortedmerge=true; 
set hive.auto.convert.sortmerge.join.noconditionaltask=true; 
set hive.enforce.bucketing=true; 
set hive.enforce.sorting=true; 

0: jdbc:hive2://xxxxxxx:10000> select count(*) from dbaobjectsbuckets1 a, dbaproceduresbuckets b 

0:JDBC:hive2:// XXXXXXXX:10000>其中a.object_id = b.object_id;

in the execution plan, I am seeing 

| Sorted合併桶映射聯合運算符| |條件圖:| |內部加入0到1 | |鍵:| | 0 object_id(type:double)| | 1周的object_id(類型:雙)

**** but the result is showing 
INFO : Hadoop job information for Stage-1: number of mappers: 32; number of reducers: 1 
    ...... 
INFO : MapReduce Total cumulative CPU time: 4 minutes 8 seconds 490 msec 

INFO:結束作業= job_1464280256859_0150 + ------ + - + | _c0 | + ------ + - + | 2 | + ------ + - +

?????我的問題是爲什麼當我使用SMB連接時只有2個?它應該是54876.

謝謝!

回答

1

使用排序子句而將數據插入到排序表

set hive.enforce.sorting=true 

插入數據排序表

之前