Oozie workflow to move file from local to hdfs - oozie

I have written an oozie workflow to copy a file from local to hdfs. It does not show any error after running the job, but it does not put the file to hdfs
Here are my codes
nameNode=hdfs://localhost:8020
jobTracker=localhost:8032
queueName=default
oozie.wf.application.path=${nameNode}/crazyoozie
focusNodeLogin=cloudera
shellScriptPath= /home/cloudera/Desktop/script.sh
workflow.xml
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="sshAction"/>
<action name="sshAction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${focusNodeLogin}</host>
<command>${shellScriptPath}</command>
<capture-output/>
</ssh>
<ok to="end"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>`
script.sh
hadoop fs -put /home/cloudera/Desktop/oozieinput /oozieresults-sshAction
status=$?
if [ $status = 0 ]; then
echo "STATUS=SUCCESS"
else
echo "STATUS=FAIL"
fi
The script.sh is there in the local FS. The output directory oozieresults-sshAction is there on hdfs.
Could you please help me on this

I see that you are using a ssh action and using a shell script which is conflicting. For executing a shell script you need to create a shell action which will be like this :
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.3">
<start to='shell' />
<action name='shell'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${shellScriptPath}</exec>
<file>${shellScriptPath}#${shellscript}</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
===============================================================
JOB.PROPERTIES
===============================================================
nameNode=hdfs://localhost:8020
jobTracker=localhost:8032
queueName=default
oozie.wf.application.path=${nameNode}/crazyoozie
focusNodeLogin=cloudera
shellScriptPath= /path/to/hdfs/script.sh
shellscript= script.sh
In the shellScriptPath give the hdfs path where you place the script

Related

Oozie coordinator get day of the week

I am trying to create a condition in my Oozie workflow, where an action should be executed only on mondays (at the end of the workflow).
So far I added a decision node in the workflow, and the current date as parameter in the coordinator, and I need to test the day of the week.
coordinator.xml
<coordinator-app name="${project}_coord" frequency="${coord_frequency}" start="${coord_start_date}" end="${coord_end_date}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
<controls>
<concurrency>1</concurrency>
<execution>LAST_ONLY</execution>
</controls>
<action>
<workflow>
<app-path>${wf_application_path}</app-path>
<configuration>
<property>
<name>currentDate</name>
<value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), 0, ‘DAY’), “yyyyMMdd”)}</value>
</property>
</configuration>
</workflow>
</action>
workflow.xml
<decision name = "send_on_monday">
<switch>
<case to = "send_file">
${currentDay} eq "MON" <-------- test on day of the week
</case>
<default to = "sendSuccessEmail" />
</switch>
</decision>
<action name="send_file">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${remoteNode}</host>
<command>/pythonvenv</command>
<args>${fsProjectDir}/send_file.py</args>
</ssh>
<ok to="sendSuccessEmail"/>
<error to="sendTechnicalFailureEmail"/>
</action>
I didn't find information on how to get the day of the week with EL functions.
Any help is appreciated.
I found a solution by using wf:actionData in a decision node :
workflow.sh
<action name="getDayOfWeek">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${remoteNode}</host>
<command>${fsProjectDir}/scripts/dayOfWeek.sh</command>
<capture-output/>
</ssh>
<ok to="send_on_monday"/>
<error to="sendTechnicalFailureEmail"/>
</action>
<decision name="send_on_monday">
<switch>
<case to = "send_file">
${wf:actionData('getDayOfWeek')['day'] eq 'Mon'}
</case>
<default to = "sendSuccessEmail" />
</switch>
</decision>
dayOfWeek.sh
#!/bin/sh
DOW=$(date +"%a")
echo day=$DOW

Oozie variable for all actions

Is there a way to set up a global variable for all actions in the workflow?
I need to define variable containing a value and then the same variable will be modified in the actions.
I tried:
<workflow-app name="test" xmlns="uri:oozie:workflow:0.5">
<global>
<configuration>
<property>
<name>variable1</name>
<value>/some/path</value>
</property>
</configuration>
</global>
.....
<action name="wf1">
....
<property>
<name>variable1</name>
<value>/some/other/path</value>
</property>
</action>
....
<action name="wf2">
.....
<property>
<name>variable1</name>
<value>/some/second/path</value>
</property>
....
</action>
<action name="createFolder">
<fs>
<mkdir path="${variable1}"/>
</fs>
<ok to="End"/>
<error to="Kill"/>
</action>
I would like to let actions to modified the value and then use it in another action. Is it possible? Right now I´m getting VARIABLE variable1 cannot be resolved
You can do this using action configuration.
You can even define default values for each action type.

Oozie workflow not running properly

I have created a new Oozie workflow in Hue UI based on the below hql query.
things.hql
drop table output_table;
create table output_table like things;
insert overwrite table output_table select t.* from things_dup td right outer join things t on (td.item_id = t.item_id) where td.item_id is null;
insert overwrite table things_dup select * from things;
The tables are,
things table
item_id product
1 soap
2 chocklate
things_dup
item_id product
1 soap
when i run the hql seperately
hadoop dfs -f things.hql
its working fine. things_dup table have updated properly.
But when i run the workflow, things_dup table have not updated.
insert overwrite table things_dup select * from things;
Can any one know why? Please help me to fix this issue.
Workflow.xml
<workflow-app name="Things_workflow" xmlns="uri:oozie:workflow:0.4">
<start to="Things_workflow"/>
<action name="Things_workflow">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/cloudera/hive-site.xml</job-xml>
<script>things.hql</script>
<file>/user/cloudera/hive-site.xml#hive-site.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
action
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>localhost.localdomain:8021</job-tracker>
<name-node>hdfs://localhost.localdomain:8020</name-node>
<job-xml>/user/cloudera/hive-site.xml</job-xml>
<script>things.hql</script>
<file>/user/cloudera/hive-site.xml#hive-site.xml</file>
</hive>
Thanks,
manimekalai
I also got issue with hive oozie but finally solved.
Please match you workflow.xml with this one.
Please must put this job-xml line before configuration node.
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-hive-wf">
<start to="hive-node" />
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.hive.defaults</name>
<value>/user/cloudera/oozie/pig_hive_data_WF/hive-site.xml</value>
</property>
</configuration>
<script>/user/cloudera/oozie/pig_hive_data_WF/load_data.q</script>
<param>LOCATION=/user/${wf:user()}/oozie/pig_hive_data_WF/output/pig_loaded_data</param>
<file>hive-conf.xml#hive-conf.xml</file>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Pig Hive job is failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
you can copy hive-site.xml file from /etc/hive/conf/hive-site.xml and put within workflow dir.

Pass -Dpig.notification.listener argument to Pig action in Oozie (Oozie client build version: 3.3.2-cdh4.7.1)

I want pass -Dpig.notification.listener argument to Pig action in oozie. But when I am doing this the pig is failing and there is no clear error description. Below is the Pig action snippet. Any suggestion how to pass listener to pig action.
<action name="FinalReport">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/home/hadoop/surjan/path" />
</prepare> <script>${nameNode}//home/hadoop/surjan/wf/pigs/FinalReport.pig</script>
<param>inDate=20160430</param>
</pig>
<ok to="success" />
<error to="kill" />
</action>
EDIT: I found the solution for passing the -D arguments. Just in case if anybody is facing same issue.
<action name="FinalReport">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/home/hadoop/surjan/path" />
</prepare>
<configuration>
<property>
<name>pig.notification.listener</name>
<value>com.surjan.util.counter.MyListener</value>
</property>
</configuration>
<script>${nameNode}//home/hadoop/surjan/wf/pigs/FinalReport.pig</script>
<param>inDate=20160430</param>
<argument>Dpig.notification.listener=com.surjan.util.counter.CounterListener</argument>
<archive>archive/mycounter.jar#mycounter</archive>
</pig>
<ok to="success" />
<error to="kill" />
enter code here

Oozie Invalid Transition

<workflow-app name="Oozie_app" xmlns="uri:oozie:workflow:0.1">
<start to="TransformWeatherData"/>
<action name="TransformWeatherData">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>/home/kingsly/working_directory/copyFromLocal.sh</exec>
<file>/home/kingsly/working_directory/copyFromLocal.sh</file>
</shell>
<ok to="Oozie_app"/>
<error to="end"/>
</action>
<end name='end' />
I am new to Oozie and i have created a workflow and job.properties file
This is how my workflow.xml looks
When i submit this workflow i am getting error as
Error: E0708 : E0708: Invalid transition, node [TransformWeatherData] transition [Oozie_app]
please help me resolve this .
My main objective is to move a file from local machine to HDfs and i have included the Hadoop command in the shell script
You were referring to a missing node. I fixed this:
<workflow-app name="Oozie_app" xmlns="uri:oozie:workflow:0.1">
<start to="TransformWeatherData"/>
<action name="TransformWeatherData">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>/home/kingsly/working_directory/copyFromLocal.sh</exec>
<file>/home/kingsly/working_directory/copyFromLocal.sh</file>
</shell>
<ok to="end"/>
<error to="kill" />
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />

Resources