<workflow-app name="Oozie_app" xmlns="uri:oozie:workflow:0.1">
<start to="TransformWeatherData"/>
<action name="TransformWeatherData">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>/home/kingsly/working_directory/copyFromLocal.sh</exec>
<file>/home/kingsly/working_directory/copyFromLocal.sh</file>
</shell>
<ok to="Oozie_app"/>
<error to="end"/>
</action>
<end name='end' />
I am new to Oozie and i have created a workflow and job.properties file
This is how my workflow.xml looks
When i submit this workflow i am getting error as
Error: E0708 : E0708: Invalid transition, node [TransformWeatherData] transition [Oozie_app]
please help me resolve this .
My main objective is to move a file from local machine to HDfs and i have included the Hadoop command in the shell script
You were referring to a missing node. I fixed this:
<workflow-app name="Oozie_app" xmlns="uri:oozie:workflow:0.1">
<start to="TransformWeatherData"/>
<action name="TransformWeatherData">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>/home/kingsly/working_directory/copyFromLocal.sh</exec>
<file>/home/kingsly/working_directory/copyFromLocal.sh</file>
</shell>
<ok to="end"/>
<error to="kill" />
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
Related
Is there a way to set up a global variable for all actions in the workflow?
I need to define variable containing a value and then the same variable will be modified in the actions.
I tried:
<workflow-app name="test" xmlns="uri:oozie:workflow:0.5">
<global>
<configuration>
<property>
<name>variable1</name>
<value>/some/path</value>
</property>
</configuration>
</global>
.....
<action name="wf1">
....
<property>
<name>variable1</name>
<value>/some/other/path</value>
</property>
</action>
....
<action name="wf2">
.....
<property>
<name>variable1</name>
<value>/some/second/path</value>
</property>
....
</action>
<action name="createFolder">
<fs>
<mkdir path="${variable1}"/>
</fs>
<ok to="End"/>
<error to="Kill"/>
</action>
I would like to let actions to modified the value and then use it in another action. Is it possible? Right now I´m getting VARIABLE variable1 cannot be resolved
You can do this using action configuration.
You can even define default values for each action type.
I have created a new Oozie workflow in Hue UI based on the below hql query.
things.hql
drop table output_table;
create table output_table like things;
insert overwrite table output_table select t.* from things_dup td right outer join things t on (td.item_id = t.item_id) where td.item_id is null;
insert overwrite table things_dup select * from things;
The tables are,
things table
item_id product
1 soap
2 chocklate
things_dup
item_id product
1 soap
when i run the hql seperately
hadoop dfs -f things.hql
its working fine. things_dup table have updated properly.
But when i run the workflow, things_dup table have not updated.
insert overwrite table things_dup select * from things;
Can any one know why? Please help me to fix this issue.
Workflow.xml
<workflow-app name="Things_workflow" xmlns="uri:oozie:workflow:0.4">
<start to="Things_workflow"/>
<action name="Things_workflow">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/cloudera/hive-site.xml</job-xml>
<script>things.hql</script>
<file>/user/cloudera/hive-site.xml#hive-site.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
action
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>localhost.localdomain:8021</job-tracker>
<name-node>hdfs://localhost.localdomain:8020</name-node>
<job-xml>/user/cloudera/hive-site.xml</job-xml>
<script>things.hql</script>
<file>/user/cloudera/hive-site.xml#hive-site.xml</file>
</hive>
Thanks,
manimekalai
I also got issue with hive oozie but finally solved.
Please match you workflow.xml with this one.
Please must put this job-xml line before configuration node.
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-hive-wf">
<start to="hive-node" />
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.hive.defaults</name>
<value>/user/cloudera/oozie/pig_hive_data_WF/hive-site.xml</value>
</property>
</configuration>
<script>/user/cloudera/oozie/pig_hive_data_WF/load_data.q</script>
<param>LOCATION=/user/${wf:user()}/oozie/pig_hive_data_WF/output/pig_loaded_data</param>
<file>hive-conf.xml#hive-conf.xml</file>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Pig Hive job is failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
you can copy hive-site.xml file from /etc/hive/conf/hive-site.xml and put within workflow dir.
I want pass -Dpig.notification.listener argument to Pig action in oozie. But when I am doing this the pig is failing and there is no clear error description. Below is the Pig action snippet. Any suggestion how to pass listener to pig action.
<action name="FinalReport">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/home/hadoop/surjan/path" />
</prepare> <script>${nameNode}//home/hadoop/surjan/wf/pigs/FinalReport.pig</script>
<param>inDate=20160430</param>
</pig>
<ok to="success" />
<error to="kill" />
</action>
EDIT: I found the solution for passing the -D arguments. Just in case if anybody is facing same issue.
<action name="FinalReport">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/home/hadoop/surjan/path" />
</prepare>
<configuration>
<property>
<name>pig.notification.listener</name>
<value>com.surjan.util.counter.MyListener</value>
</property>
</configuration>
<script>${nameNode}//home/hadoop/surjan/wf/pigs/FinalReport.pig</script>
<param>inDate=20160430</param>
<argument>Dpig.notification.listener=com.surjan.util.counter.CounterListener</argument>
<archive>archive/mycounter.jar#mycounter</archive>
</pig>
<ok to="success" />
<error to="kill" />
enter code here
I have written an oozie workflow to copy a file from local to hdfs. It does not show any error after running the job, but it does not put the file to hdfs
Here are my codes
nameNode=hdfs://localhost:8020
jobTracker=localhost:8032
queueName=default
oozie.wf.application.path=${nameNode}/crazyoozie
focusNodeLogin=cloudera
shellScriptPath= /home/cloudera/Desktop/script.sh
workflow.xml
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="sshAction"/>
<action name="sshAction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${focusNodeLogin}</host>
<command>${shellScriptPath}</command>
<capture-output/>
</ssh>
<ok to="end"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>`
script.sh
hadoop fs -put /home/cloudera/Desktop/oozieinput /oozieresults-sshAction
status=$?
if [ $status = 0 ]; then
echo "STATUS=SUCCESS"
else
echo "STATUS=FAIL"
fi
The script.sh is there in the local FS. The output directory oozieresults-sshAction is there on hdfs.
Could you please help me on this
I see that you are using a ssh action and using a shell script which is conflicting. For executing a shell script you need to create a shell action which will be like this :
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.3">
<start to='shell' />
<action name='shell'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${shellScriptPath}</exec>
<file>${shellScriptPath}#${shellscript}</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
===============================================================
JOB.PROPERTIES
===============================================================
nameNode=hdfs://localhost:8020
jobTracker=localhost:8032
queueName=default
oozie.wf.application.path=${nameNode}/crazyoozie
focusNodeLogin=cloudera
shellScriptPath= /path/to/hdfs/script.sh
shellscript= script.sh
In the shellScriptPath give the hdfs path where you place the script
I am trying to shorten my oozie workflow and I am referring this cloudera post.
http://blog.cloudera.com/blog/2013/11/how-to-shorten-your-oozie-workflow-definitions/
Per this post, we can add job.xml file in global section, but its not working for me.
<global>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>job1.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</global>
-----------------------Main Workflow------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="main-wf">
<global>
<job-xml>/user/${wf:user()}/${examplesRoot}/apps/map-reduce/job.xml</job-xml>
</global>
<start to="main-node"/>
<action name="main-node">
<sub-workflow>
<app-path>/user/${wf:user()}/${examplesRoot}/apps/map-reduce/workflow.xml</app-path>
</sub-workflow>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>MR failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
-----------------------Sub Workflow------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="OozieBDU-wf">
<start to="wordcount-node"/>
<action name="wordcount-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>com.yumecorp.WordCount$TokenizerMapper</value>
</property>
<property>
<name>mapreduce.reduce.class</name>
<value>com.yumecorp.WordCount$IntSumReducer</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/user/${wf:user()}/${examplesRoot}/input-data/decodeAndProfile</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
</property>
<property>
<name>mapreduce.job.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>MR failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
-----------------------job.xml------------------------------
<configuration>
<property>
<name>jobTracker</name>
<value>pari.hhh.com:8021</value>
</property>
<property>
<name>nameNode</name>
<value>hdfs://pari.hhh.com:8020</value>
</property>
<property>
<name>queueName</name>
<value>default</value>
</property>
<property>
<name>examplesRoot</name>
<value>wordcount</value>
</property>
<property>
<name>outputDir</name>
<value>map-reduce</value>
</property>
</configuration>
-----------------------job.properties------------------------------
examplesRoot=wordcount
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/mainworkflow.xml
-----------------------oozie command -----------------------------
oozie job -config job.properties -run
-----------------------ERROR -------------------------------------
OB[0000012-150131091133585-oozie-oozi-W] ACTION[0000012-150131091133585-oozie-oozi-W#wordcount-node] ELException in ActionStartXCommand
javax.servlet.jsp.el.ELException: variable [jobTracker] cannot be resolved
at org.apache.oozie.util.ELEvaluator$Context.resolveVariable(ELEvaluator.java:106)
at org.apache.commons.el.NamedValue.evaluate(NamedValue.java:124)
at org.apache.commons.el.ExpressionString.evaluate(ExpressionString.java:114)
at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:274)
at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:203)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:188)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
at org.apache.oozie.command.XCommand.call(XCommand.java:281)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Anybody can help??
Cheers
Pari
To pass properties from a workflow to its sub-workflow you need to add the <propagate-configuration/> tag to the <sub-workflow> action.
For example:
<action name="main-node">
<sub-workflow>
<app-path>/user/${wf:user()}/${examplesRoot}/apps/map-reduce/workflow.xml</app-path>
<propagate-configuration/>
</sub-workflow>
<ok to="end" />
<error to="fail" />
</action>