MapReduce在配置表使用HCatalog

我正在嘗試編寫計算Hive表（Hadoop 2.2.0.2.0.6.0-101）中字段值分佈的map-reduce作業。例如：MapReduce在配置表使用HCatalog

輸入蜂巢表「ATable」：

+------+--------+ 
! name | rating | | 
+------+--------+ 
| Bond | 7  | 
| Megre| 2  | 
! Holms| 11 | 
| Puaro| 7  | 
! Holms| 1  | 
| Puaro| 7  | 
| Megre| 2  |  
| Puaro| 7  | 
+------+--------+

地圖，減少工作也應產生在蜂巢下面的輸出表：

+--------+-------+--------+ 
| Field | Value | Count | 
+--------+-------+--------+ 
| name | Bond | 1 | 
| name | Puaro | 3 | 
| name | Megre | 2 | 
| name | Holms | 1 | 
| rating | 7  | 4 | 
| rating | 11 | 1 | 
| rating | 1  | 1 | 
| rating | 2  | 2 | 
+--------+-------+--------+

要獲得字段名/值我需要以獲得訪問HCatalog元數據，所以我可以使用這些地圖方法（org.apache.hadoop.mapreduce.Mapper）對此我試圖採用示例： http://java.dzone.com/articles/mapreduce-hive-tables-using

從這個例子中的代碼編譯，但產生大量的廢棄警告：

protected void map(WritableComparable key, HCatRecord value, 
org.apache.hadoop.mapreduce.Mapper.Context context) 
throws IOException, InterruptedException { 

// Get table schema 
HCatSchema schema = HCatBaseInputFormat.getTableSchema(context); 

Integer year = new Integer(value.getString("year", schema)); 
Integer month = new Integer(value.getString("month", schema)); 
Integer DayofMonth = value.getInteger("dayofmonth", schema); 

context.write(new IntWritable(month), new IntWritable(DayofMonth)); 
}

廢棄警告：與最新

HCatRecord 
HCatSchema 
HCatBaseInputFormat.getTableSchema

哪裏可以找到使用HCatalog的一個類似的例子的map-reduce，不不推薦使用的接口？

謝謝！

來源

2014-03-04 Anton Ashanin

我使用了Cloudera examples之一給出的例子，並使用this blog給出的框架來編譯我的代碼。我還必須在pom.xml中添加hcatalog的maven repo。本示例使用新的mapreduce API，而不是已棄用的mapreduce API。希望能幫助到你。

 <dependency> 
     <groupId>org.apache.hcatalog</groupId> 
     <artifactId>hcatalog-core</artifactId> 
     <version>0.11.0</version> 
     </dependency>

來源

2014-04-25 16:28:59

MapReduce在配置表使用HCatalog

回答

相關問題