我將DynamoDB表導出爲s3作爲備份(通過EMR)。當我導出時,我將數據存儲爲lzo壓縮文件。我的配置單元查詢在下面,但基本上我遵循了「使用數據壓縮將Amazon DynamoDB表導出到Amazon S3存儲桶」http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html將s3中的壓縮(lzo)數據導入配置單元
我現在想做相反的操作 - 取出我的LZO文件並獲取他們回到一個蜂巢表。你怎麼做到這一點?我期待看到一些hive configuration property的輸入,但沒有。我搜索了一下,發現了一些提示,但沒有明確的說明,也沒有什麼可行的。
文件中S3的格式爲:S3:// [mybucket] /backup/year=2012/month=08/day=01/000000.lzo
這裏是我的HQL,做出口:
SET dynamodb.throughput.read.percent=1.0;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
CREATE EXTERNAL TABLE hiveSBackup (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMOTABLENAME}",
"dynamodb.column.mapping" = "id:id,periodStart:periodStart,allotted:allotted,remaining:remaining,created:created,seconds:seconds,served:served,modified:modified");
CREATE EXTERNAL TABLE s3_export (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://<mybucket>/backup';
INSERT OVERWRITE TABLE s3_export
PARTITION (year="${PARTITIONYEAR}", month="${PARTITIONMONTH}", day="${PARTITIONDAY}")
SELECT * from hiveSBackup;
任何想法如何從S3,解壓縮到蜂巢表?
你有一個完整的示例hql腳本,它的工作原理?我試過你提到的沒有成功。我的數據再次分區。我只是想導入蜂巢,而不是發電機。 – rynop 2012-09-18 15:44:10
編輯我的答案添加一個例子。 – Tim 2012-09-18 16:47:42