最好的方法是檢查解釋計劃。兩者都產生了相同的解釋計劃。即使插入語句也以類似的方式運行。它可能在早期版本的配置單元中有所不同。
explain select * from (
select * from departments
union all
select * from departments
) q;
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1
STAGE PLANS: Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: departments
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: department_id (type: int), department_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TableScan
alias: departments
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: department_id (type: int), department_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.124 seconds, Fetched: 55 row(s)
explain
select * from departments
union all
select * from departments
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: departments
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: department_id (type: int), department_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TableScan
alias: departments
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: department_id (type: int), department_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 14 Data size: 1538 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 28 Data size: 3076 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.064 seconds, Fetched: 55 row(s)
table1的字段是「col1」和「col2」 –
這些查詢是相同的。您可以使子查詢並行運行。這會提高性能。 Set set hive.exec.parallel = true;和hive.exec.parallel.thread.number = 8(允許的最大並行線程數量) – leftjoin