2016-08-10 71 views
1

我有bucketed列,甚至在設置所有參數後,我沒有獲得任何性能優勢。 下面是我正在使用的查詢和我創建的桶,我也添加了解釋計劃結果。Bucketing不在蜂巢中工作

select count(*) from bigtable_main a inner join 
big_cnt10000 b where a.srrecordid = b.srrecordid; 
---112 seconds.... 

ALTER TABLE bigtable_main CLUSTERED BY(srrecordid) SORTED BY(srrecordid) INTO 40 BUCKETS ; 
ALTER TABLE big_cnt10000 CLUSTERED BY(srrecordid) SORTED BY(srrecordid) INTO 40 BUCKETS ; 

---112 seconds.... 
--------------------------------------------------- 
SET hive.enforce.bucketing=true; 
SET hive.optimize.bucketmapjoin=true; 
set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 

even the explain plan is same. Any idea? 
Vertex dependency in root stage 
Map 1 <- Map 3 (BROADCAST_EDGE) 
Reducer 2 <- Map 1 (SIMPLE_EDGE) 

Stage-0 
    Fetch Operator 
     limit:-1 
     Stage-1 
     Reducer 2 
     File Output Operator [FS_13] 
      compressed:false 
      Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
      table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} 
      Group By Operator [GBY_11] 
      | aggregations:["count(VALUE._col0)"] 
      | outputColumnNames:["_col0"] 
      | Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
      |<-Map 1 [SIMPLE_EDGE] 
       Reduce Output Operator [RS_10] 
        sort order: 
        Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
        value expressions:_col0 (type: bigint) 
        Group By Operator [GBY_9] 
        aggregations:["count()"] 
        outputColumnNames:["_col0"] 
        Statistics:Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
        Select Operator [SEL_8] 
         Statistics:Num rows: 31669970 Data size: 3166997036 Basic stats: COMPLETE Column stats: NONE 
         Filter Operator [FIL_16] 
          predicate:(_col0 = _col11) (type: boolean) 
          Statistics:Num rows: 31669970 Data size: 3166997036 Basic stats: COMPLETE Column stats: NONE 
          Map Join Operator [MAPJOIN_19] 
          | condition map:[{"":"Inner Join 0 to 1"}] 
          | HybridGraceHashJoin:true 
          | keys:{"Map 3":"srrecordid (type: string)","Map 1":"srrecordid (type: string)"} 
          | outputColumnNames:["_col0","_col11"] 
          | Statistics:Num rows: 63339940 Data size: 6333994073 Basic stats: COMPLETE Column stats: NONE 
          |<-Map 3 [BROADCAST_EDGE] 
          | Reduce Output Operator [RS_5] 
          |  key expressions:srrecordid (type: string) 
          |  Map-reduce partition columns:srrecordid (type: string) 
          |  sort order:+ 
          |  Statistics:Num rows: 42529 Data size: 4252905 Basic stats: COMPLETE Column stats: NONE 
          |  Filter Operator [FIL_18] 
          |  predicate:srrecordid is not null (type: boolean) 
          |  Statistics:Num rows: 42529 Data size: 4252905 Basic stats: COMPLETE Column stats: NONE 
          |  TableScan [TS_1] 
          |   alias:b 
          |   Statistics:Num rows: 85058 Data size: 8505810 Basic stats: COMPLETE Column stats: NONE 
          |<-Filter Operator [FIL_17] 
           predicate:srrecordid is not null (type: boolean) 
           Statistics:Num rows: 57581763 Data size: 5758176306 Basic stats: COMPLETE Column stats: NONE 
           TableScan [TS_0] 
            alias:a 
            Statistics:Num rows: 115163525 Data size: 11516352512 Basic stats: COMPLETE Column stats: NONE 

回答

0

Hive編譯器需要元數據和元信息來決定執行計劃。 doc

編譯器需要元數據,以便發送getMetaData請求並從MetaStore接收sendMetaData請求。

此元數據用於查詢查詢樹中的表達式以及基於查詢謂詞修剪分區。編譯器生成的計劃是一個階段的DAG,每個階段都是一個map/reduce作業,一個元數據操作或一個HDFS操作。 map/reduce的階段,該計劃包含地圖運營商的樹木和減少操作者的樹(對於那些需要減速機

阿爾特存儲語句更改表的物理存儲的屬性,但不是操作(即是在映射器執行的操作樹)元。

使用正確杯水車薪,並創建表。

以下

是鏈接查看詳細。

NOTE: These commands will only modify Hive's metadata, and will NOT reorganize or reformat existing data. Users should make sure the actual data layout conforms with the metadata definition.

+0

非常感謝你......它像你所說的那樣工作。 –