來自HTable的MapReduce輸入

我有一個MapReduce作業，輸入源自HTable。從Java MapReduce代碼中，您如何將作業輸入格式設置爲HBase TableInputFormat？來自HTable的MapReduce輸入

有什麼像JDBC連接來連接到HTable數據庫？

2013-06-20 T. Webster

你想使用MapReduce的訪問HBase的？ – zsxwing

如果您的客戶端和HBase在同一臺計算機上運行，則不需要爲您的客戶端配置任何與HBase對話的內容。只需創建一個實例HBaseConfiguration並連接到您的HTable：

Configuration conf = HBaseConfiguration.create(); 
HTable table = new HTable(conf, "TABLE_NAME");

但是，如果你的客戶在遠程機器上運行它oreder依賴ZooKeeper的跟你的HBase的集羣。因此，客戶需要ZooKeeper集合的位置才能繼續。這就是我們通常爲了配置我們的客戶，使他們連接到HBase的集羣：

Configuration conf = HBaseConfiguration.create(); 
conf.set("hbase.zookeeper.quorum", "ZK_MACHINE_IP/HOSTNAME"); 
conf.set("hbase.zookeeper.property.clientPort","2181"); 
HTable table = new HTable(conf, "TABLE_NAME");

這是你如何做到這一點通過Java API。 HBase也支持其他一些API。您可以在此找到更多here。

來到你的第一個問題，如果你需要使用TableInputFormat作爲InputFormat在你的MR的工作，你做它通過作業對象，像這樣：

job.setInputFormatClass(TableInputFormat.class);

希望這回答了你的問題。

來源

2013-06-20 19:43:54 Tariq

HBase的帶有一個TableMapResudeUtil類，可以很容易建立的map/reduce作業這裏的第一個樣本from the manual：

Configuration config = HBaseConfiguration.create(); 
Job job = new Job(config, "ExampleRead"); 
job.setJarByClass(MyReadJob.class);  // class that contains mapper 

Scan scan = new Scan(); 
scan.setCaching(500);  // 1 is the default in Scan, which will be bad for MapReduce jobs 
scan.setCacheBlocks(false); // don't set to true for MR jobs 
// set other scan attrs 
... 

TableMapReduceUtil.initTableMapperJob(
    tableName,  // input HBase table name 
    scan,    // Scan instance to control CF and attribute selection 
    MyMapper.class, // mapper 
    null,    // mapper output key 
    null,    // mapper output value 
    job); 
job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper 

boolean b = job.waitForCompletion(true); 
if (!b) { 
    throw new IOException("error with job!"); 
}

來源

2013-06-21 02:17:19

來自HTable的MapReduce輸入

回答

相關問題