2015-12-21 49 views
0

我在選擇另外兩個表中的數據的hive表中進行插入操作。我使用UNION ALL進行Hive插入查詢優化

第一個查詢:

"insert overwrite table table1 
select uniod.col1,uniod.col2 from (
select col1, col2 from table2 
UNION ALL 
select col1, col2 from table3 
) uniod; " 

第二個查詢:

"insert overwrite table table1 
select col1, col2 from table2 
UNION ALL 
select col1, col2 from table3 
; " 

我的問題:在性能或一個方面這兩個查詢相同的是比其他更好嗎?

+0

table1的字段是「col1」和「col2」 –

+0

這些查詢是相同的。您可以使子查詢並行運行。這會提高性能。 Set set hive.exec.parallel = true;和hive.exec.parallel.thread.number = 8(允許的最大並行線程數量) – leftjoin

回答

0

最好的方法是檢查解釋計劃。兩者都產生了相同的解釋計劃。即使插入語句也以類似的方式運行。它可能在早期版本的配置單元中有所不同。

explain select * from (
select * from departments 
union all 
select * from departments 
) q; 

STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 

STAGE PLANS: Stage: Stage-1 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

    Stage: Stage-0 
    Fetch Operator 
     limit: -1 
     Processor Tree: 
     ListSink 

Time taken: 0.124 seconds, Fetched: 55 row(s) 

explain 
select * from departments 
union all 
select * from departments 
; 

STAGE DEPENDENCIES: 
    Stage-1 is a root stage 
    Stage-0 depends on stages: Stage-1 

STAGE PLANS: 
    Stage: Stage-1 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
      TableScan 
      alias: departments 
      Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: department_id (type: int), department_name (type: string) 
       outputColumnNames: _col0, _col1 
       Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: string) 
        outputColumnNames: _col0, _col1 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

    Stage: Stage-0 
    Fetch Operator 
     limit: -1 
     Processor Tree: 
     ListSink 

Time taken: 0.064 seconds, Fetched: 55 row(s) 
+0

太棒了。謝謝 –