2012-09-24 26 views
1

我想安排由Oozie.H我面臨以下問題的Hbase Map-Reduce作業。如何安排Hbase Map-oozie減少工作?

How/Where to specify these properties in oozie workflow ? 
( i> Table name for Mapper/Reducer 
    ii> scan object for Mapper  ) 


    Scan scan = new Scan(new Get()); 

    scan.setMaxVersions(); 

    scan.addColumn(Bytes.toBytes(FAMILY), 
      Bytes.toBytes(VALUE)); 
    scan.addColumn(Bytes.toBytes(FAMILY), 
      Bytes.toBytes(DATE)); 

    Job job = new Job(conf, JOB_NAME + "_" + TABLE_USER); 
    // These two properties :- 
    TableMapReduceUtil.initTableMapperJob(TABLE_USER, scan, 
      Mapper.class, Text.class, Text.class, job); 
    TableMapReduceUtil.initTableReducerJob(DETAILS_TABLE, 
      Reducer.class, job); 

please let me know the best way to schedule a Hbase Map-Reduce Job by Oozie .

謝謝:) :)(據我)安排一個HBase的Map_Reduce工作

回答

3

最好的辦法是安排它作爲一個.java文件。 它運行良好,沒有必要編寫代碼來將掃描更改爲字符串等。 因此,我正在計劃我的工作,如Java文件,直到我得到更好的選擇。

workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf"> 
<start to="java-node"/> 
<action name="java-node"> 
    <java> 
      <job-tracker></job-tracker> 
     <name-node></name-node> 
     <configuration> 
      <property> 
       <name>mapred.job.queue.name</name> 
       <value>${queueName}</value> 
      </property> 
     </configuration> 
     <main-class>org.apache.oozie.example.DemoJavaMain</main-class> 
     <arg>Hello</arg> 
     <arg>Oozie!</arg> 
    <arg>This</arg> 
     <arg>is</arg> 
    <arg>Demo</arg> 
     <arg>Oozie!</arg> 

    </java> 
    <ok to="end"/> 
    <error to="fail"/> 
</action> 
<kill name="fail"> 
    <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 
</kill> 
<end name="end"/> 

+0

喜@ 100gods,你會給一個樣本?我試圖運行一個mapreduce來從base導出數據作爲oozie中的java動作,但失敗了。我現在很困惑。 – BigPotato

0

您還可以安排使用<Map-reduce>標籤的工作,但它並不像它的調度像Java文件一樣容易。這需要付出相當大的努力,但可以被視爲一種替代方法。

 <action name='jobSample'> 
     <map-reduce> 
      <job-tracker>${jobTracker}</job-tracker> 
      <name-node>${nameNode}</name-node> 
      <configuration> 
       <!-- This is required for new api usage --> 
       <property> 
        <name>mapred.mapper.new-api</name> 
        <value>true</value> 
       </property> 
       <property> 
        <name>mapred.reducer.new-api</name> 
        <value>true</value> 
       </property> 
       <!-- HBASE CONFIGURATIONS --> 
       <property> 
        <name>hbase.mapreduce.inputtable</name> 
        <value>TABLE_USER</value> 
       </property> 
       <property> 
        <name>hbase.mapreduce.scan</name> 
        <value>${wf:actionData('get-scanner')['scan']}</value> 
       </property> 
       <property> 
        <name>hbase.zookeeper.property.clientPort</name> 
        <value>${hbaseZookeeperClientPort}</value> 
       </property> 
       <property> 
        <name>hbase.zookeeper.quorum</name> 
        <value>${hbaseZookeeperQuorum}</value> 
       </property> 
       <!-- MAPPER CONFIGURATIONS --> 
       <property> 
        <name>mapreduce.inputformat.class</name> 
        <value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value> 
       </property> 
       <property> 
        <name>mapred.mapoutput.key.class</name> 
        <value>org.apache.hadoop.io.Text</value> 
       </property> 
       <property> 
        <name>mapred.mapoutput.value.class</name> 
        <value>org.apache.hadoop.io.Text</value> 
       </property> 
       <property> 
        <name>mapreduce.map.class</name> 
        <value>com.hbase.mapper.MyTableMapper</value> 
       </property> 
       <!-- REDUCER CONFIGURATIONS --> 
       <property> 
        <name>mapreduce.reduce.class</name> 
        <value>com.hbase.reducer.MyTableReducer</value> 
       </property> 
       <property> 
        <name>hbase.mapred.outputtable</name> 
        <value>DETAILS_TABLE</value> 
       </property> 
       <property> 
        <name>mapreduce.outputformat.class</name> 
        <value>org.apache.hadoop.hbase.mapreduce.TableOutputFormat</value> 
       </property> 

       <property> 
        <name>mapred.map.tasks</name> 
        <value>${mapperCount}</value> 
       </property> 
       <property> 
        <name>mapred.reduce.tasks</name> 
        <value>${reducerCount}</value> 
       </property> 
       <property> 
        <name>mapred.job.queue.name</name> 
        <value>${queueName}</value> 
       </property> 
      </configuration> 
     </map-reduce> 
     <ok to="end" /> 
     <error to="fail" /> 
    </action> 
    <kill name="fail"> 
     <message>Map/Reduce failed, error 
      message[${wf:errorMessage(wf:lastErrorNode())}]</message> 
    </kill> 
    <end name='end' /> 

要了解有關屬性名稱和值的更多信息,請轉儲configration參數。 此外,掃描屬性的掃描信息的一些序列化(一個Base 64編碼的版本),所以不知道如何指定 -

scan.addColumn(Bytes.toBytes(FAMILY), 
      Bytes.toBytes(VALUE));