6
指定壓縮編解碼器的INSERT OVERWRITE SELECT我有一個像在蜂巢
CREATE TABLE beacons
(
foo string,
bar string,
foonotbar string
)
COMMENT "Digest of daily beacons, by day"
PARTITIONED BY (day string COMMENt "In YYYY-MM-DD format");
蜂巢表填充,我做這樣的事情:
SET hive.exec.compress.output=True;
SET io.seqfile.compression.type=BLOCK;
INSERT OVERWRITE TABLE beacons PARTITION (day = "2011-01-26") SELECT
someFunc(query, "foo") as foo,
someFunc(query, "bar") as bar,
otherFunc(query, "foo||bar") as foonotbar
)
FROM raw_logs
WHERE day = "2011-01-26";
這將構建一個新的分區與個人產品通過放氣壓縮,但理想的情況是通過LZO壓縮編碼解碼器。
不幸的是,我不完全確定如何實現,但我認爲這是許多運行時設置之一,或者可能只是CREATE TABLE DDL中的一行。