配置單元 - 外部（動態）分區表

我在MySQL中有一個表格。 nas_comps。配置單元 - 外部（動態）分區表

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code; 
comp_code  count(leg_id) 
'J'   20640 
'Y'   39680

首先，我輸入數據到HDFSHadoop使用版本1.0.2）Sqoop：

sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \ 
--username hadoopranch \ 
--password hadoopranch \ 
--query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \ 
-m 1 \ 
--target-dir /pros/olap2/dataimports/nas_comps

然後，我創建的外部，分配蜂巢表：

分區列在描述時顯示：

hive> describe extended nas_comps; 
OK 
ds_name string 
dep_date  string 
crr_code  string 
flight_no  string 
orgn string 
dstn string 
physical_cap int 
adjusted_cap int 
closed_cap  int 
leg_id int 
month int 
comp_code  string 

Detailed Table Information  Table(tableName:nas_comps, dbName:pros_olap2_optim, 
owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), 
FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, 
type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), 
FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, 
comment:null), FieldSchema(name:physical_cap, type:int, comment:null), 
FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, 
type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), 
FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, 
comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, 
numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters: 
{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys: 
[FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int, 
comment:null), FieldSchema(name:comp_code, type:string, comment:null)], 
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE)

但我不知道，如果分區創建，因爲：

hive> show partitions nas_comps; 
OK 
Time taken: 0.599 seconds 


select count(1) from nas_comps;

返回0記錄

如何創建動態分區外部蜂巢表？

來源

2013-07-26 Kaliyug Antagonist

Hive不會爲您創建分區。
只需創建一個由所需分區鍵進行分區的表格，然後從外部表格執行insert overwrite table到新的分區表格（設置hive.exec.dynamic.partition=true和hive.exec.dynamic.partition.mode=nonstrict）。

如果你必須保持外部分區您必須手動創建目錄表（每分區1個目錄中的名稱應該是PARTION_KEY=VALUE）然後使用MSCK REPAIR TABLE table_name;command

來源

2013-07-26 17:02:17 dimamah

動態分區

分區被插入記錄成蜂巢表期間動態地添加。

僅支持插入語句。
不支持load data聲明。
在將數據插入配置單元表之前，需要啓用動態分區設置。 hive.exec.dynamic.partition.mode=nonstrict默認值爲strict hive.exec.dynamic.partition=true默認值爲false。

動態分區查詢

SET hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.dynamic.partition=true; 
INSERT INTO table_name PARTITION (loaded_date) 
select * from table_name1 where loaded_date = 20151217

這裏loaded_date = 20151217是分區及其值。

限制：

動態分區將只有上述聲明的工作。
它將根據從table_name1的loaded_date列中選擇的數據動態創建分區;

如果你的條件不符合上述標準，再搭配：

首先創建然後分區表這樣做：

ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2');

或者請使用Link動態分區的創建。

來源

2013-07-26 11:03:55

是啊，我已經檢查了這一點，但這些都不是動態分區 - 仍然必須爲分區提供值。 –

對，通過shell腳本運行它。你可以在shell腳本中爲分區創建一個變量，並在alter table命令中傳遞它，否則目前沒有可用的選項:( –

配置單元 - 外部（動態）分區表

回答

相關問題