我有bucketed列,甚至在設置所有參數後,我沒有獲得任何性能優勢。 下面是我正在使用的查詢和我創建的桶,我也添加了解釋計劃結果。Bucketing不在蜂巢中工作
select count(*) from bigtable_main a inner join
big_cnt10000 b where a.srrecordid = b.srrecordid;
---112 seconds....
ALTER TABLE bigtable_main CLUSTERED BY(srrecordid) SORTED BY(srrecordid) INTO 40 BUCKETS ;
ALTER TABLE big_cnt10000 CLUSTERED BY(srrecordid) SORTED BY(srrecordid) INTO 40 BUCKETS ;
---112 seconds....
---------------------------------------------------
SET hive.enforce.bucketing=true;
SET hive.optimize.bucketmapjoin=true;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
even the explain plan is same. Any idea?
Vertex dependency in root stage
Map 1 <- Map 3 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2
File Output Operator [FS_13]
compressed:false
Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Group By Operator [GBY_11]
| aggregations:["count(VALUE._col0)"]
| outputColumnNames:["_col0"]
| Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
|<-Map 1 [SIMPLE_EDGE]
Reduce Output Operator [RS_10]
sort order:
Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
value expressions:_col0 (type: bigint)
Group By Operator [GBY_9]
aggregations:["count()"]
outputColumnNames:["_col0"]
Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Select Operator [SEL_8]
Statistics:Num rows: 31669970 Data size: 3166997036 Basic stats: COMPLETE Column stats: NONE
Filter Operator [FIL_16]
predicate:(_col0 = _col11) (type: boolean)
Statistics:Num rows: 31669970 Data size: 3166997036 Basic stats: COMPLETE Column stats: NONE
Map Join Operator [MAPJOIN_19]
| condition map:[{"":"Inner Join 0 to 1"}]
| HybridGraceHashJoin:true
| keys:{"Map 3":"srrecordid (type: string)","Map 1":"srrecordid (type: string)"}
| outputColumnNames:["_col0","_col11"]
| Statistics:Num rows: 63339940 Data size: 6333994073 Basic stats: COMPLETE Column stats: NONE
|<-Map 3 [BROADCAST_EDGE]
| Reduce Output Operator [RS_5]
| key expressions:srrecordid (type: string)
| Map-reduce partition columns:srrecordid (type: string)
| sort order:+
| Statistics:Num rows: 42529 Data size: 4252905 Basic stats: COMPLETE Column stats: NONE
| Filter Operator [FIL_18]
| predicate:srrecordid is not null (type: boolean)
| Statistics:Num rows: 42529 Data size: 4252905 Basic stats: COMPLETE Column stats: NONE
| TableScan [TS_1]
| alias:b
| Statistics:Num rows: 85058 Data size: 8505810 Basic stats: COMPLETE Column stats: NONE
|<-Filter Operator [FIL_17]
predicate:srrecordid is not null (type: boolean)
Statistics:Num rows: 57581763 Data size: 5758176306 Basic stats: COMPLETE Column stats: NONE
TableScan [TS_0]
alias:a
Statistics:Num rows: 115163525 Data size: 11516352512 Basic stats: COMPLETE Column stats: NONE
非常感謝你......它像你所說的那樣工作。 –