蜂箱 - 分區表

我創建了一個蜂巢表查詢 -蜂箱 - 分區表

create table studpart4(id int, name string) partitioned by (course string, year int) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

創建成功。

下面的命令加載的數據 -

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2);

我的輸入數據文件的樣子 -

101 student1 cse 1 

102 student2 cse 2 

103 student3 eee 3 

104 student4 eee 4 

105 student5 cse 1 

106 student6 cse 2 

107 student7 eee 3 

108 student8 eee 4 

109 student9 cse 1 

110 student10 cse 2

但是輸出顯示爲（的select * from studpart4） -

101 student1 cse 2 

102 student2 cse 2 

103 student3 eee 2 

104 student4 eee 2 

105 student5 cse 2 

106 student6 cse 2 

107 student7 eee 2 

108 student8 eee 2 

109 student9 cse 2 

110 student10 cse 2

爲什麼最後一列是2.爲什麼它被改變和更新錯誤。

來源

2016-08-20 Suresh J

http://stackoverflow.com/a/13224581/2079249 –

您顯示的結果與您告知Hive如何處理您的數據完全相同。

在你的第一個命令，您要創建一個分區表studpart4有兩列，id和name，以及兩個分區鍵，course和year（曾經創造，表現得像常規列）。現在，在你的第二個命令，你在做什麼是這樣的：

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2)

這基本上意味着「副本全部來自student_input_1.txt數據到表studpart4和course列的所有值設置爲‘自定義搜索引擎’和列year的所有值爲'2'「。在內部，Hive會創建一個包含分區鍵的目錄結構。您的數據將存儲在類似這樣的目錄：

.../studpart4/course=cse/year=2/

我懷疑你真正想要的是蜂巢檢測的course和year列值在.txt文件，併爲您設置正確的價值觀。爲了執行該操作，您必須使用表格的dynamic partitioning，並將您的數據的策略按照loading的規則寫入外部表格，然後使用INSERT OVERWRITE INTO TABLE命令將數據存儲到您的studpart4表格中。 BigDataLearner在評論中發佈的鏈接描述了這種策略。

我希望這會有所幫助。

來源

2016-08-21 15:59:27

非常好。感謝您的詳細解釋。我現在澄清。 –

不客氣:-) –

蜂箱 - 分區表

回答

相關問題