基本上,您希望基於一天中預定時間的數據可用性爲一堆MR作業運行oozie工作流程。您需要定義Decision
節點來檢查數據是否存在以及mapreduce
用於運行mapreduce作業的操作。您也可以定義郵件通知功能以及作業失敗。您可以在這裏找到詳細信息MapReduce Node,Decision Node,Oozie Actions Documentation。我已經給出了樣本decision
節點和mapreduce
節點以及job.properties
文件。這是運行oozie工作流程的命令。您可以將其安排爲cron,以便在特定時間每天運行它。
oozie job -config job.properties -D param1=value -run
<workflow-app xmlns="uri:oozie:workflow:0.4" name="${app_name}">
<global>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</global>
<start to="data1_check"/>
<decision name="data1_check">
<switch>
<case to="data1_job">${fs:exists(input-data)}</case>
<default to="data2_check"/>
</switch>
</decision>
<action name='data1_job'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
</prepare>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.myorg.WordCount.Map</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.myorg.WordCount.Reduce</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputDir}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
</configuration>
</map-reduce>
<ok to="data2_check"/>
<error to="data2_check"/>
</action>
###Here we are going to data2_check decision node for both failure and success.
Because you want to run the next data job to run. You can stop the work flow by sending it to kill node failure.
###Your Last MR action will go to 'kill' node for failure and 'end' node for success.
<kill name="fail">
<message>Errormessage[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
這裏是job.properties
文件。
nameNode=hdfs://localhost:9000 # or use a remote-server url. eg: hdfs://abc.xyz.yahoo.com:8020
jobTracker=localhost:9001 # or use a remote-server url. eg: abc.xyz.yahoo.com:50300
queueName=default
examplesRoot=map-reduce
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}
inputDir=input-data
outputDir=map-reduce
你用'時間bound'意思文檔?你能否詳細說明你的第三點? – YoungHobbit
受時間限制,我指的是作業應該同時運行。關於第三點,例如,我今天可能會收到3套工作流程文件。但明天我可能會收到2套文件,而不是3份。所以用戶應該可以在這種情況下跳過一步 – Satya