我正在使用CDH5.3,我試圖編寫一個mapreduce程序來掃描表並執行一些處理。我創建延伸TableMapper和例外是我得到一個映射:HBase Mapreduce使用TableMapper時的依賴關係問題
java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)
但你可以注意到這裏是尋找在HDFS路徑的protobuf-java的2.5.0.jar,但實際上它是目前在本地路徑 - /usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar,我驗證。這在正常的mapreduce程序中不會發生。只有當我使用TableMapper時發生此錯誤。
我的驅動程序代碼如下:
public class AppDriver {
public static void main(String[] args) throws Exception{
Configuration hbaseConfig = HBaseConfiguration.create();
hbaseConfig.set("hbase.zookeeper.quorum", PropertiesUtil.getZookeperHostName());
hbaseConfig.set("hbase.zookeeper.property.clientport", PropertiesUtil.getZookeperPortNum());
Job job = Job.getInstance(hbaseConfig, "hbasemapreducejob");
job.setJarByClass(AppDriver.class);
// Create a scan
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
// scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));
TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan, ESportMapper.class, Text.class, RecordStatusVO.class, job);
job.setReducerClass(ESportReducer.class);
job.setNumReduceTasks(1);
TableMapReduceUtil.addDependencyJars(job);
// Write the results to a file in the output directory
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
}
我正在屬性文件ARGS [0]。
一些更多的下劃線信息:
我使用獨立CDH 5.3在我的本地系統和HBase的0.98.6。 我的hbase在sudo分佈式模式下運行在hdfs之上。
我gradle.build是如下:
apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'application'
// Basic Properties
sourceCompatibility = 1.7
targetCompatibility = '1.7'
version = '3.0'
mainClassName ="com.ESport.mapreduce.App.AppDriver"
jar {
manifest {
attributes "Main-Class": "$mainClassName"
}
from {
configurations.compile.collect { it.isDirectory() ? it : zipTree(it) }
}
zip64 true
}
repositories {
mavenCentral()
maven { url "http://clojars.org/repo" }
maven { url "http://repository.cloudera.com/artifactory/cloudera- repos/" }
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.+'
compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
compile 'org.apache.storm:storm-core:0.9.4'
compile 'org.apache.commons:commons-compress:1.5'
compile 'org.elasticsearch:elasticsearch:1.7.1'
compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0'){
exclude group: 'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0') {
exclude group: 'org.slf4j'
exclude group: 'org.jruby'
exclude group: 'jruby-complete'
exclude group: 'org.codehaus.jackson'
}
compile 'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-server:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'
compile('com.thinkaurelius.titan:titan-core:0.5.2'){
exclude group: 'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
exclude group: 'org.apache.hbase'
exclude group: 'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0'){
exclude group: 'org.slf4j'
}
compile 'org.perf4j:perf4j:0.9.16'
compile 'com.fasterxml.jackson.core:jackson-core:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-databind:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-annotations:2.5.3'
compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'
}
,我使用這個命令來運行jar:
Hadoop的罐子ESportingMapReduce-3.0.jar config.properties/myoutput
另外我想在這裏添加的是,這個jar不是hbase依賴jar,而是一個hadoop jar,它存在於hadoop常見的lib路徑中,可以注意到。 hbase jars我附加到構建,可以在build.gradle中註明。 – kundan
嘗試在hbase-env.sh中添加HADOOP_CLASSPATH – mbaxi
請讓我知道你的config.properties文件中有什麼 – Thanga