訪問蜂房表格數據用的MapReduce

在Hadoop的2.2的單節點安裝，我試圖運行Cloudera的例子「與MapReduce的訪問表的數據」即複製數據從一個表到另一：訪問蜂房表格數據用的MapReduce

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_19_6.html

實施例代碼用許多棄用警告編譯（見下文）。在從Eclipse運行此示例之前，我在Hive默認數據庫中創建了輸入表'simple'。我在命令行上輸入'simple'並輸出'simpid'表格。儘管輸入表中默認的數據庫已經存在，當我運行這段代碼，我得到異常：

java.io.IOException: NoSuchObjectException(message:default.simple table not found

問題：

1）爲什麼「未找到表」異常發生的呢？如何解決這個問題？

2）本示例中不推薦使用的HCatRecord，HCatSchema，HCatBaseInputFormat如何轉換爲最新的穩定API？

package com.bigdata; 

import java.io.IOException; 
import java.util.*; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapreduce.*; 
import org.apache.hadoop.util.*; 
import org.apache.hcatalog.mapreduce.*; 
import org.apache.hcatalog.data.*; 
import org.apache.hcatalog.data.schema.*; 

public class UseHCat extends Configured implements Tool { 

public static class Map extends Mapper<WritableComparable, HCatRecord, Text, IntWritable> { 
    String groupname; 

    @Override 
    protected void map(WritableComparable key, 
         HCatRecord value, 
         org.apache.hadoop.mapreduce.Mapper<WritableComparable, HCatRecord, 
         Text, IntWritable>.Context context) 
     throws IOException, InterruptedException { 
     // The group table from /etc/group has name, 'x', id 
     groupname = (String) value.get(0); 
     int id = (Integer) value.get(1); 
     // Just select and emit the name and ID 
     context.write(new Text(groupname), new IntWritable(id)); 
    } 
} 

public static class Reduce extends Reducer<Text, IntWritable, 
            WritableComparable, HCatRecord> { 

    protected void reduce(Text key, 
          java.lang.Iterable<IntWritable> values, 
          org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, 
          WritableComparable, HCatRecord>.Context context) 
     throws IOException, InterruptedException { 
     // Only expecting one ID per group name 
     Iterator<IntWritable> iter = values.iterator(); 
     IntWritable iw = iter.next(); 
     int id = iw.get(); 
     // Emit the group name and ID as a record 
     HCatRecord record = new DefaultHCatRecord(2); 
     record.set(0, key.toString()); 
     record.set(1, id); 
     context.write(null, record); 
    } 
} 

public int run(String[] args) throws Exception { 
    Configuration conf = getConf(); 
    args = new GenericOptionsParser(conf, args).getRemainingArgs(); 

    // Get the input and output table names as arguments 
    String inputTableName = args[0]; 
    String outputTableName = args[1]; 
    // Assume the default database 
    String dbName = null; 

    Job job = new Job(conf, "UseHCat"); 
    HCatInputFormat.setInput(job, InputJobInfo.create(dbName, 
      inputTableName, null)); 
    job.setJarByClass(UseHCat.class); 
    job.setMapperClass(Map.class); 
    job.setReducerClass(Reduce.class); 

    // An HCatalog record as input 
    job.setInputFormatClass(HCatInputFormat.class); 

    // Mapper emits a string as key and an integer as value 
    job.setMapOutputKeyClass(Text.class); 
    job.setMapOutputValueClass(IntWritable.class); 

    // Ignore the key for the reducer output; emitting an HCatalog record as value 
    job.setOutputKeyClass(WritableComparable.class); 
    job.setOutputValueClass(DefaultHCatRecord.class); 
    job.setOutputFormatClass(HCatOutputFormat.class); 

    HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName, 
       outputTableName, null)); 
    HCatSchema s = HCatOutputFormat.getTableSchema(job); 
    System.err.println("INFO: output schema explicitly set for writing:" + s); 
    HCatOutputFormat.setSchema(job, s); 
    return (job.waitForCompletion(true) ? 0 : 1); 
} 

public static void main(String[] args) throws Exception { 
    int exitCode = ToolRunner.run(new UseHCat(), args); 
    System.exit(exitCode); 
} 
}

當我一個單節點上運行Hadoop的這個2.2我得到以下異常：

14/03/05 15:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 
14/03/05 15:17:22 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 
14/03/05 15:17:22 INFO metastore.ObjectStore: ObjectStore, initialize called 
14/03/05 15:17:23 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 
14/03/05 15:17:24 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 
14/03/05 15:17:25 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 
14/03/05 15:17:25 INFO metastore.ObjectStore: Initialized ObjectStore 
14/03/05 15:17:27 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 
14/03/05 15:17:27 INFO metastore.HiveMetaStore: 0: get_database: NonExistentDatabaseUsedForHealthCheck 
14/03/05 15:17:27 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=get_database: NonExistentDatabaseUsedForHealthCheck 
14/03/05 15:17:27 ERROR metastore.RetryingHMSHandler: NoSuchObjectException(message:There is no database named nonexistentdatabaseusedforhealthcheck) 

at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431) 
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) 
at com.sun.proxy.$Proxy6.getDatabase(Unknown Source) 
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) 
at com.sun.proxy.$Proxy7.get_database(Unknown Source) 
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810) 
at org.apache.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:277) 
at org.apache.hcatalog.common.HiveClientCache.get(HiveClientCache.java:147) 
at org.apache.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:547) 
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104) 
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86) 
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87) 
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:56) 
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:48) 
at com.bigdata.UseHCat.run(UseHCat.java:64) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
at com.bigdata.UseHCat.main(UseHCat.java:91) 

14/03/05 15:17:27 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=simple 
14/03/05 15:17:27 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=get_table : db=default tbl=simple 
14/03/05 15:17:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 
14/03/05 15:17:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 

    Exception in thread "main" java.io.IOException: NoSuchObjectException(message:default.simple table not found) 

at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:89) 
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:56) 
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:48) 
at com.bigdata.UseHCat.run(UseHCat.java:64) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
at com.bigdata.UseHCat.main(UseHCat.java:91) 
    Caused by: NoSuchObjectException(message:default.simple table not found) 
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1373) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) 
at com.sun.proxy.$Proxy7.get_table(Unknown Source) 
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:854) 
at org.apache.hcatalog.common.HCatUtil.getTable(HCatUtil.java:193) 
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105) 
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86) 
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87) 
... 6 more 
14/03/05 15:17:29 INFO metastore.HiveMetaStore: 1: Shutting down the object store... 
14/03/05 15:17:29 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=Shutting down the object store... 
14/03/05 15:17:29 INFO metastore.HiveMetaStore: 1: Metastore shutdown complete. 
14/03/05 15:17:29 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=Metastore shutdown complete.

來源

2014-03-05 dokondr

看起來你沒有在Eclipse的classpath hive-site.xml做。 Hive會在您的配置中查找Metastore服務器地址。當它找不到地址時，它會創建或加載嵌入的元數據。這個Metastore當然沒有你創建的表格。

編輯（在回答您的評論）：

是的，你可以通過鍵獲取值，但爲了做到這一點，你需要的HCatSchema爲表。爲此在地圖上設置階段...

HCatSchema schema = HCatBaseInputFormat.getTableSchema(context.getConfiguration);

和地圖相...

value.get('field', schema);

來源

2014-03-05 16:04:36 climbage

謝謝！有沒有辦法從'Mapper.map'方法訪問表格元數據，例如字段名稱及其類型？因此，不要通過像'value.get（0）'這樣的索引來獲取HCatRecord字段值，而是使用'value'作爲關聯數組來通過key來獲取值，如'value.getKey（「aFieledName」）？ – dokondr

@dokondr查看我的更新。 – climbage

幫助將'hive-site.xml'放在CLASSPATH中。接下來的問題是，我無法從我的代碼創建輸出表，並且必須從Hive shell手動執行此操作，並以'hive'用戶身份運行。現在，當在普通用戶下運行時，我的代碼沒有'write'權限，我得到：'org.apache.hadoop.security.AccessControlException：Permission denied：user = dk，access = WRITE，inode =「/ apps/hive/warehouse/simpids「：hdfs：hdfs：drwxr-xr-x'請告知如何設置權限，以便普通用戶可以在Hive中創建表和寫入數據。非常感謝！ – dokondr

訪問蜂房表格數據用的MapReduce

回答

相關問題