2017-04-17 126 views
0

這裏是我的蜂巢查詢蜂巢查詢執行計劃

Insert into schemaB.employee partition(year) 
select * from schemaA.employee; 

下面是這個查詢產生的查詢執行計劃。

hive> explain <query>; 

STAGE DEPENDENCIES: 
    Stage-1 is a root stage 
    Stage-0 depends on stages: Stage-1 
    Stage-2 depends on stages: Stage-0 

STAGE PLANS: 
    Stage: Stage-1 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: employee 
      Statistics: Num rows: 65412411 Data size: 59121649936 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: Col1 (type: binary), col2 (type: binary), col3 (type: array<string>), year (type: int) 
       outputColumnNames: _col0, _col1, _col2, _col3 
       Statistics: Num rows: 65412411 Data size: 59121649936 Basic stats: COMPLETE Column stats: NONE 
       Reduce Output Operator 
       key expressions: _col3 (type: int) 
       sort order: + 
       Map-reduce partition columns: _col3 (type: int) 
       Statistics: Num rows: 65412411 Data size: 59121649936 Basic stats: COMPLETE Column stats: NONE 
       value expressions: _col0 (type: binary), _col1 (type: binary), _col2 (type: array<string>), _col3 (type: int) 
     Reduce Operator Tree: 
     Extract 
      Statistics: Num rows: 65412411 Data size: 59121649936 Basic stats: COMPLETE Column stats: NONE 
      File Output Operator 
      compressed: true 
      Statistics: Num rows: 65412411 Data size: 59121649936 Basic stats: COMPLETE Column stats: NONE 
      table: 
       input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
       output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat 
       serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde 
       name: schemaB.employee 

    Stage: Stage-0 
    Move Operator 
     tables: 
      partition: 
      year 
      replace: false 
      table: 
       input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
       output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat 
       serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde 
       name: schemaB.employee 

    Stage: Stage-2 
    Stats-Aggr Operator 

我有兩個相關的查詢執行計劃問題:

  1. 爲什麼會出現在查詢計劃減少一步?在我的理解中,它需要做的就是將數據從一個HDFS位置複製到另一個位置,這可以通過映射器單獨實現。減少步驟與表中存在的分區有關嗎?
  2. 什麼是統計彙總運算符步驟出現在第2階段?我無法找到相關文件解釋這一點。

回答

0
  1. 記錄寫作也是減少階段的責任。由於您正在寫回分佈式FS(HDFS),因此可以通過產生所需/指定數量的減速器來並行寫入。
  2. 「統計聚合」用於從您正在寫入的表中收集統計信息。 例如,分區中的行數,列數據模式等。此數據用於在查詢該表時生成查詢計劃。