HBase Mapreduce使用TableMapper時的依賴關係問題

我正在使用CDH5.3，我試圖編寫一個mapreduce程序來掃描表並執行一些處理。我創建延伸TableMapper和例外是我得到一個映射：HBase Mapreduce使用TableMapper時的依賴關係問題

java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar 
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) 
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) 
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) 
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) 
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) 
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93) 
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) 
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267) 
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)

但你可以注意到這裏是尋找在HDFS路徑的protobuf-java的2.5.0.jar，但實際上它是目前在本地路徑 - /usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar，我驗證。這在正常的mapreduce程序中不會發生。只有當我使用TableMapper時發生此錯誤。

我的驅動程序代碼如下：

public class AppDriver { 

public static void main(String[] args) throws Exception{ 
Configuration hbaseConfig = HBaseConfiguration.create(); 
    hbaseConfig.set("hbase.zookeeper.quorum", PropertiesUtil.getZookeperHostName()); 
    hbaseConfig.set("hbase.zookeeper.property.clientport", PropertiesUtil.getZookeperPortNum()); 

Job job = Job.getInstance(hbaseConfig, "hbasemapreducejob"); 

    job.setJarByClass(AppDriver.class); 

    // Create a scan 
    Scan scan = new Scan(); 

    scan.setCaching(500);  // 1 is the default in Scan, which will be bad for MapReduce jobs 
    scan.setCacheBlocks(false); // don't set to true for MR jobs 
    // scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey())); 
    // scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey())); 

TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan, ESportMapper.class, Text.class, RecordStatusVO.class, job); 
    job.setReducerClass(ESportReducer.class); 

    job.setNumReduceTasks(1); 
    TableMapReduceUtil.addDependencyJars(job); 

    // Write the results to a file in the output directory 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 


    boolean b = job.waitForCompletion(true); 
    if (!b) { 
     throw new IOException("error with job!"); 
    } 

}

我正在屬性文件ARGS [0]。

一些更多的下劃線信息：

我使用獨立CDH 5.3在我的本地系統和HBase的0.98.6。我的hbase在sudo分佈式模式下運行在hdfs之上。

我gradle.build是如下：

apply plugin: 'java' 
apply plugin: 'eclipse' 
apply plugin: 'application' 
// Basic Properties 
sourceCompatibility = 1.7 
targetCompatibility = '1.7' 

version = '3.0' 
mainClassName ="com.ESport.mapreduce.App.AppDriver" 


jar { 
    manifest { 
    attributes "Main-Class": "$mainClassName" 
    } 

from { 
    configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } 
} 

zip64 true 
} 


repositories { 
mavenCentral() 
maven { url "http://clojars.org/repo" } 
maven { url "http://repository.cloudera.com/artifactory/cloudera- repos/" } 
} 

dependencies { 

testCompile group: 'junit', name: 'junit', version: '4.+' 

compile group: 'commons-collections', name: 'commons-collections', version: '3.2' 
compile 'org.apache.storm:storm-core:0.9.4' 
compile 'org.apache.commons:commons-compress:1.5' 
compile 'org.elasticsearch:elasticsearch:1.7.1' 

compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0'){ 
    exclude group: 'org.slf4j' 
} 
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0') { 

    exclude group: 'org.slf4j' 
    exclude group: 'org.jruby' 
    exclude group: 'jruby-complete' 
    exclude group: 'org.codehaus.jackson' 

} 

compile 'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0' 
compile 'org.apache.hbase:hbase-server:0.98.6-cdh5.3.0' 
compile 'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0' 

compile('com.thinkaurelius.titan:titan-core:0.5.2'){ 
    exclude group: 'org.slf4j' 
} 
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){ 
    exclude group: 'org.apache.hbase' 
    exclude group: 'org.slf4j' 
} 
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0'){ 
    exclude group: 'org.slf4j' 
} 
compile 'org.perf4j:perf4j:0.9.16' 

compile 'com.fasterxml.jackson.core:jackson-core:2.5.3' 
compile 'com.fasterxml.jackson.core:jackson-databind:2.5.3' 
compile 'com.fasterxml.jackson.core:jackson-annotations:2.5.3' 
compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'

}

，我使用這個命令來運行jar：

Hadoop的罐子ESportingMapReduce-3.0.jar config.properties/myoutput

來源

2015-12-18 kundan

另外我想在這裏添加的是，這個jar不是hbase依賴jar，而是一個hadoop jar，它存在於hadoop常見的lib路徑中，可以注意到。 hbase jars我附加到構建，可以在build.gradle中註明。 – kundan

嘗試在hbase-env.sh中添加HADOOP_CLASSPATH – mbaxi

請讓我知道你的config.properties文件中有什麼 – Thanga

如果您嘗試在僞分佈式模式下在hbase中進行安裝，則最有可能的原因是將其添加到了$PATH。
只需從$PATH中刪除hadoop home，即可在僞分佈式模式下啓動hbase。
有些人默認添加hadoop home .bashrc。
如果您將其添加到.bashrc中，請將hadoop從其中移除。

來源

2016-02-10 10:01:06 Bharath

它對我有用，但我不明白。爲什麼需要從路徑上移除hadoop home？ –

HBase Mapreduce使用TableMapper時的依賴關係問題

回答

相關問題