總結: 我覺得我的系統忽略了預分類表的概念。 - 我期望在排序步驟中節省時間,因爲我使用了 預先排序的數據,但查詢計劃似乎指示了排序步驟中的中間 。在Hive中使用分類表
的骯髒細節如下:
的設置=======
我已經設置了以下標誌:=============
set hive.enforce.bucketing = true;
set mapred.reduce.tasks=8;
set mapred.map.tasks=8;
在這裏,我創建一個表來保存在磁盤上的數據========的臨時副本
CREATE TABLE trades
(symbol STRING, exchange STRING, price FLOAT, volume INT, cond
INT, bid FLOAT, ask FLOAT, time STRING)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol) SORTED BY (symbol, time) INTO 8 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
在這裏,我將數據複製磁盤到日e表 順便說一句,這裏的數據按符號聚類,並按時間排序。 我似乎無法得到蜂巢使用這個概念...即避免 再次排序
LOAD DATA LOCAL INPATH '%(dir)s2010-05-07'
INTO TABLE trades
partition (dt='2010-05-07');
我用下面的決賽桌執行了瓢潑大雨=========== 和強加排序順序===========
CREATE TABLE alltrades
(symbol STRING, exchange STRING, price FLOAT, volume INT, cond
INT, bid FLOAT, ask FLOAT, time STRING)
CLUSTERED BY (symbol) SORTED BY (symbol, time) INTO 8 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
數據從蜂巢表==========加載
insert overwrite table alltrades
select symbol, exchange, price, volume, cond, bid, ask, time
from trades
distribute by symbol sort by symbol, time;
令人失望的是,以看到任何查詢所有需要 排序的符號,時間是否重新排序...有沒有圍繞這個方法 ? 此外,有沒有辦法讓這整個過程工作在1查詢步驟 而不是2?
爲什麼分揀似乎不起作用=======
注意到,該表構建,並與排序子句填充。 恐怕,如果不需要排序,那麼放棄這些將導致未來減速器的行爲 。
下面是一個查詢,在我看來,不應該 涉及分揀計劃......但實際上做。========
hive> explain select symbol, time, price from alltrades sort by symbol, time;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME alltrades)))
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
(TOK_SELEXPR (TOK_TABLE_OR_COL symbol)) (TOK_SELEXPR (TOK_TABLE_OR_COL
time)) (TOK_SELEXPR (TOK_TABLE_OR_COL price))) (TOK_SORTBY
(TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL symbol))
(TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL time)))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
alltrades
TableScan
alias: alltrades
Select Operator
expressions:
expr: symbol
type: string
expr: time
type: string
expr: price
type: float
outputColumnNames: _col0, _col1, _col2
Reduce Output Operator
key expressions:
expr: _col0
type: string
expr: _col1
type: string
sort order: ++
tag: -1
value expressions:
expr: _col0
type: string
expr: _col1
type: string
expr: _col2
type: float
Reduce Operator Tree:
Extract
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
感謝,但我放棄了蜂巢而回......圖我想要的東西打火機像蟒蛇迪斯科。 – fodon