2012-09-04 127 views
20

我試圖從作業跟蹤器收集一些信息。對於初學者來說,我想先從讓正在運行的作業信息,如作業ID或作業名等,但已經卡住了,這裏是我的本錢(打印出的作業ID爲當前運行的作業):混淆了hadoop作業跟蹤器api

public static void main(String[] args) throws IOException { 
     Configuration conf = HBaseConfiguration.create(); 
     conf.set("hbase.zookeeper.quorum", "zk1.myhost,zk2.myhost,zk3.myhost"); 
     conf.set("hbase.zookeeper.property.clientPort", "2181"); 

     InetSocketAddress jobtracker = new InetSocketAddress("jobtracker.mapredhost.myhost", 8021); 
     JobClient jobClient = new JobClient(jobtracker, conf); 
     JobStatus[] jobs = jobClient.jobsToComplete(); 

     for (int i = 0; i < jobs.length; i++) { 
      JobStatus js = jobs[i]; 
      if (js.getRunState() == JobStatus.RUNNING) { 
       JobID jobId = js.getJobID(); 
       System.out.println(jobId); 
      } 
     } 
    } 

這個以上當試圖顯示工作id時,作爲魅力,但現在我想顯示作業名稱。所以我加了打印作業ID後,這條線:

System.out.println(jobClient.getJob(jobId).getJobName()); 

我得到這個異常:

Exception in thread "main" java.lang.NullPointerException 
    at org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:226) 
    at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1080) 
    at org.apache.test.JobTracker.main(JobTracker.java:28) 

jobClientnull。我知道這是因爲我試着用空檢查語句,但是這個jobClient.getJob(jobId)null。我在這裏做錯了什麼?

根據API我應該沒問題,

http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)

首先從jobClient得到RunningJob比,一旦你已經運行的作業,然後把它的名字http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/RunningJob.html#getJobName()

任何人做了這樣的事情之前?我可以使用jsoup通過GET請求獲取此信息,但我認爲這是獲取此信息的更好方法。這裏

問題的更新是我的Hadoop/HBase的依賴關係:

<dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-client</artifactId> 
      <version>0.23.1-mr1-cdh4.0.0b2</version> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-core</artifactId> 
      <version>0.23.1-mr1-cdh4.0.0b2</version> 
      <exclusions> 
       <exclusion> 
        <groupId>org.mortbay.jetty</groupId> 
        <artifactId>jetty</artifactId> 
       </exclusion> 
       <exclusion> 
        <groupId>javax.servlet</groupId> 
        <artifactId>servlet-api</artifactId> 
       </exclusion> 
      </exclusions> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.hbase</groupId> 
      <artifactId>hbase</artifactId> 
      <version>0.92.1-cdh4b2-SNAPSHOT</version> 
     </dependency> 

賞金更新:

這裏是我的進口:

import java.io.IOException; 
import java.net.InetSocketAddress; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.hbase.HBaseConfiguration; 
import org.apache.hadoop.mapred.JobClient; 
import org.apache.hadoop.mapred.JobID; 
import org.apache.hadoop.mapred.JobStatus; 

這裏是System.out.println(jobId)輸出:

job_201207031810_1603 

目前只有一份工作正在運行。

+1

您正在使用什麼版本? 0.21喜歡在你的文檔鏈接? –

+0

您好托馬斯,這是很好的觀察我會更新我的問題 –

+0

所以你的羣集運行在CDH4 0.23.1像你的依賴? –

回答

17

看看JobClient的內部類NetworkedJob
(來源:/home/user/hadoop/src/mapred/org/apache/hadoop/mapred/JobClient.java)

它的構造試圖在線路225 JobClient獲取Configuration對象,但因爲它是空new JobClient(InetSocketAddress jobTrackAddr, Configuration conf)不設置它:

// Set the completion poll interval from the configuration. 
     // Default is 5 seconds. 
     Configuration conf = JobClient.this.getConf(); 
     this.completionPollIntervalMillis = conf.getInt(COMPLETION_POLL_INTERVAL_KEY, 
      DEFAULT_COMPLETION_POLL_INTERVAL); //NPE occurs here! 

作爲一種變通方法,創造了JobClient對象之後手動設置。這將解決你的問題:

.. 
JobClient jobClient = new JobClient(jobtracker, conf); 
jobClient.setConf(conf); 
.... 

旁註:

我通過實例化對象Configuration

Configuration conf = new Configuration(); 
conf.addResource(new Path("/path_to/core-site.xml")); 
conf.addResource(new Path("/path_to/hdfs-site.xml")); 
+0

優秀的觀察先生!如果你手工將setConf設置爲jobClient,則無法分配賞金 –

+0

@GandalfStormCrow你可以隨時點擊Lorand的答案旁邊的小按鈕250來獎賞賞金 – HypnoticSheep