如何將配置單元分區讀入Apache Crunch管道？

我能夠將hdfs中的文本文件讀入apache crunch管道。但現在我需要閱讀配置單元分區。問題是，根據我們的設計，我不應該直接訪問該文件。因此，現在我需要一些方式來使用像HCatalog這樣的分區來訪問分區。如何將配置單元分區讀入Apache Crunch管道？

來源

2014-10-20 Jijo Mathew

您可以使用org.apache.hadoop.hive.metastore API或HCat API。這是一個使用hive.metastore的簡單例子。除非你想加入Mapper/Reducer中的一些Hive分區，否則你必須在你的Pipeline開始之前或之前打電話。

HiveMetaStoreClient hmsc = new HiveMetaStoreClient(hiveConf) 
HiveMetaStoreClient hiveClient = getHiveMetastoreConnection(); 
List<Partition> partitions = hiveClient.listPartittions("default", "my_hive_table", 1000) 
for(Partition partition: partitions) { 
    System.out.println("HDFS data location of the partition: " + partition.getSd().getLocation()) 
}

你需要的唯一的另一件事是出口蜂巢的conf目錄：

export HIVE_CONF_DIR=/home/mmichalski/hive/conf

來源

2014-11-21 23:02:29 Marcin

如何將配置單元分區讀入Apache Crunch管道？

回答

相關問題