Run shell script Oozie action - oozie

I am trying to execute a shell script before my pig script using Oozie. As far as I can tell, I am doing all the same things as every example I can find. My action is:
<action name="shell_action" cred="yca_auth">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${appPath}/shell_script.sh</exec>
<file>${appPath}/shell_script.sh#shell_script.sh</file>
<ok to="pig_script_action"/> <error to="kill"/>
</shell>
</action>
But I keep getting the error:
Caused by: org.apache.oozie.workflow.WorkflowException: E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'ok'. One of '{"uri:oozie:shell-action:0.1":file, "uri:oozie:shell-action:0.1":archive, "uri:oozie:shell-action:0.1":capture-output}' is expected.
I do not understand why this is happening. Please help

The problem was that ok to and error to should not have been inside the

The right configuration is as following:
<action name="shell_action" cred="yca_auth">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${appPath}/shell_script.sh</exec>
<file>${appPath}/shell_script.sh#shell_script.sh</file>
</shell>
<ok to="pig_script_action"/>
<error to="kill"/>
</action>

Related

Execute all the oozie actions on the same node

I need to execute an oozie workflow to process some files using a java program, but before execute the jar, I need to get the files from HDFS, and after been processed delete them.
I came up with this workflow with 3 shell action, and it runs, but just the first shell action works, after digging into the log files, I realized that the shell action 2 and 3 never were executed. So unless they where executed on different nodes, there is no reason why they should not work, I taste the 3 shell scripts in the edge node and they work.
<start to="shellAction"/>
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.3">
<exec>${shellCmd1}</exec>
<file>${shellCmd1}#${shellCmd1}</file>
</shell>
<ok to="shellAction2"/>
<error to="fail" />
</action>
<action name="shellAction2">
<shell xmlns="uri:oozie:shell-action:0.3">
<exec>java</exec>
<argument>arg</argument>
<argument>./${EXEC}:`classpath` MainClass</argument>
<argument>"${arg1}"</argument>
<argument>"${arg2}"</argument>
<argument>${arg3}</argument>
<file>${EXEC}#${EXEC}</file>
</shell>
<ok to="shellAction3"/>
<error to="shellAction3" />
</action>
<action name="shellAction3">
<shell xmlns="uri:oozie:shell-action:0.3">
<exec>${shellCmd3}</exec>
<file>${shellCmd3}#${shellCmd3}</file>
</shell>
<ok to="end"/>
<error to="fail" />
</action>
So my question is, is there a way to guarantee the execution of those 3 actions on the same container/node?

Oozie fork call same action multiple times

I am trying to run multiple instances of same oozie action in parallel using fork. While trying to do so, I duly got error stating
"E0744" A fork is not allowed to have multiple transitions to the same node
I looked at the oozie codebase as well (LiteWorkflowAppParser) and found out oozie is indeed not allowing run call same action multiple times in fork as it validates for that. Now I disabled the validation using oozie.wf.validate.ForkJoin=false and ran the workflow again. This time, the workflow is running fine, but only one instance of the action is running. To me, it seems like though I have disabled validation, oozie underneath lets only unique actions to run, and duplicate actions are skipped.
Now my question is, how can I achieve running multiple oozie actions in parallel?
My workflow is like:
<workflow-app xmlns="uri:oozie:workflow:0.4" name="my-workflow">
<start to="parallelize"/>
<fork name="parallelize">
<path start="performAction" />
<path start="performAction" />
<path start="performAction" />
</fork>
<action name="performAction">
.......
<ok to="joinForks"/>
<error to="fail"/>
</action>
<join name="joinForks" to="end" />
<kill name="fail">
<message>Responder Application, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name="end"/>
</workflow-app>
Additional details:
With the above configuration, I see that the workflow is stuck after completion of performAction in transition to joinForks stage. It looks like the joinForks is waiting for remaining actions to report to it so that it can end. But the problem is, the remaining actions were never launched, resulting in workflow waiting indefinitely.
After this, I just copied performAction into three different actions performAction1, performAction2, performAction3 and used them in the forks. Now the workflow is completing as joinForks gets called from all the forks. But still, I really wish I don't have to do this workaround of duplicating the same action again and again under different name. Any help is appreciated.
Thanks
I suspect that the error message you're getting is because of this code:
<fork name="parallelize">
<path start="performAction" />
<path start="performAction" />
<path start="performAction" />
</fork>
<action name="performAction">
...
<ok to="joinForks"/>
<error to="fail"/>
</action>
I suspect the forks must be uniquely named, like this:
<fork name="parallelize">
<path start="performAction1" />
<path start="performAction2" />
<path start="performAction3" />
</fork>
<action name="performAction1">
...
<ok to="joinForks"/>
<error to="fail"/>
</action>
<action name="performAction2">
...
<ok to="joinForks"/>
<error to="fail"/>
</action>
<action name="performAction3">
...
<ok to="joinForks"/>
<error to="fail"/>
</action>
It seems that having 3 forks with the same name is throwing off the uniquely named requirement for transitions.
Can you try the uniquely-named individual forks instead?

oozie distcp job execution

I have a oozie work-flow which is performing a distcp operation.
Workflow file is as below :
<workflow-app xmlns="uri:oozie:workflow:0.3" name="distcp-wf">
<start to="distcp-node"/>
<action name="distcp-node">
<distcp xmlns="uri:oozie:distcp-action:0.1">
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<prepare>
<delete path="${namenode}/tmp/mohit/"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queue_name}</value>
</property>
</configuration>
<arg>-m 1</arg>
<arg>${number_of_mapper}</arg>
<arg>-skipcrccheck</arg>
<arg>${namenode}/tmp/mohit/data.txt</arg>
</distcp>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>DistCP failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
I want to set the number of mapper using -m using distcp.
How can i do that I have tried with
<arg>-m 1</arg>
and
<arg>1<arg>
But did not worked for me.
The error that I am getting is as below :
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.DistcpMain], main() threw exception, Returned value from distcp is non-zero (-1)
java.lang.RuntimeException: Returned value from distcp is non-zero (-1)
Args are for input/output as described in the documentation
The first arg indicates the input and the second arg indicates the output
For changing the number of producers/reducers use the configuration for example :
<configuration>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>

Oozie work flow error

I am trying to write a simple Oozie workflow to execute a simple hive script. The hive script is executing fine without any issue, but getting issue with work flow execution.
The workflow.xml is as below:
<workflow-app xmlns="uri:oozie:workflow:0.4" name="nielsen_dma_xref_intr_dly_load_wf">
<credentials>
<credential name="hive2_cred" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive2_jdbc_uri}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive2_server_principal}</value>
</property>
</credential>
</credentials>
<start to="nielsen_dma_xref_intr_dly_load_wf_start"/>
<action name="hive_load_nielsen_dma_xref_oozie" cred='hive2_cred'>
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${hiveSiteXML}</job-xml>
<jdbc-url>${jdbc_hive}</jdbc-url>
<password>${password}</password>
<script>${scripts}/nielsen_dma_xref_load.hql</script>
<param>db_dbname_dbname=${db_dbname}</param>
</hive2>
<ok to="hive_load_nielsen_dma_xref_oozie"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="nielsen_dma_xref_intr_dly_load_wf_completed"/>
</workflow-app>
All files are placed under /user/kchandr2/wf folder in HDFS
I am executing the wrokflow using the command oozie job -oozie http://<<hostname>>:11000/oozie -config /home/kchandr2/wf/nielsen_dma_xref_intr_dly_load_wf.properties -run -verbose >> /home/kchandr2/wf/Logs/nielsen_dma_xref_intr_dly_load_wf_$(date '+%Y%m%d').log 2>&1 where the properties file is placed in the local directory /home/kchandr2/wf
I am getting the Error: E0706 : E0706: Node cannot transition to itself node [hive_load_nielsen_dma_xref_oozie]
In your hive2 action
<action name="hive_load_nielsen_dma_xref_oozie" cred='hive2_cred'>
you have an "ok" transition to the very same action node
<ok to="hive_load_nielsen_dma_xref_oozie"/>
This is not allowed. That's why you get the error:
Error: E0706 : E0706: Node cannot transition to itself node [hive_load_nielsen_dma_xref_oozie]
Oozie Workflows are DAGs (Direct Acyclic Graph). You cannot have loops.

Oozie workflow parameter not getting set from coordinator

I have the following workflow XML and Coordinator XML, both created through Hue Oozie Editor.
<workflow-app name="demo8" xmlns="uri:oozie:workflow:0.4">
<start to="cds4"/>
<action name="cds4">
<fs>
<mkdir path='${nameNode}/my/path/towritefile/${wf:conf(DATE)}'/>
</fs>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The Coordinator conf is below
<coordinator-app name="Demo4CoordinatorNew"
frequency="${coord:minutes(5)}"
start="2015-01-18T18:15Z" end="2015-01-19T10:46Z" timezone="US/Pacific"
xmlns="uri:oozie:coordinator:0.2">
<controls>
<concurrency>1</concurrency>
<execution>FIFO</execution>
</controls>
<action>
<workflow>
<app-path>${wf_application_path}</app-path>
<configuration>
<property>
<name>DATE</name>
<value>${coord:formatTime(coord:nominalTime(), 'yyyy-MM-dd')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
I have executed the Coordinator. The value that is passed for DATE parameter is blank. Do you see any issue?
In short, Im trying to create a folder in HDFS based on the Time at which the workflow is triggered.
I also tried
<mkdir path='${nameNode}/my/path/towritefile/${wf:conf("DATE")}'/>
When I do this, it gives an error.
In the workflow, replace ${wf:conf(DATE)} by ${DATE} that way it will be parameterized correctly.

Resources