Issue with oozie workflow submitting sh to launch spark2 job - oozie

Let me explain the issue, it's turning me crazy...
I have a spark2 program that I want to submit from an oozie workflow.
Because spark2 by default cannot be submitted directly from oozie, I have created an sh with the spark2-submit sentence to execute the spark2 job.
If I run this sh from the console it works perfect. However, when I run it from the oozie workflow, there is no way to make it work, but the worse is I cannot see a clear error on logs returned by the execution.
These are the files I am using:
thintest.sh:
spark2-submit --master yarn --class main hdfs:///home/cloudera/thintest/thintest-assembly-0.1.jar
job.properties
oozie.use.system.libpath=True
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=quickstart.cloudera:8032
security_enabled=False
hiveXml=/etc/hive/conf/hive-site.xml
appName=thintest
appPath=${nameNode}/home/cloudera/thintest
oozie.wf.application.path=${appPath}/workflow.xml
shellPath=${appPath}/thintest.sh
workflow.xml
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app name="thintest" xmlns="uri:oozie:workflow:0.5">
<start to="shell"/>
<action name="shell">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${shellPath}</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>${shellPath}#thintest.sh</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Any help will be very appreciated.

Oozie has a spark-2 profile and the Spark action in Oozie works perfectly fine after Oozie is built with it.
I wouldn't recommend hacking around the shell action as it will turn into a nightmare sooner or later.

Related

Run shell script Oozie action

I am trying to execute a shell script before my pig script using Oozie. As far as I can tell, I am doing all the same things as every example I can find. My action is:
<action name="shell_action" cred="yca_auth">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${appPath}/shell_script.sh</exec>
<file>${appPath}/shell_script.sh#shell_script.sh</file>
<ok to="pig_script_action"/> <error to="kill"/>
</shell>
</action>
But I keep getting the error:
Caused by: org.apache.oozie.workflow.WorkflowException: E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'ok'. One of '{"uri:oozie:shell-action:0.1":file, "uri:oozie:shell-action:0.1":archive, "uri:oozie:shell-action:0.1":capture-output}' is expected.
I do not understand why this is happening. Please help
The problem was that ok to and error to should not have been inside the
The right configuration is as following:
<action name="shell_action" cred="yca_auth">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>${appPath}/shell_script.sh</exec>
<file>${appPath}/shell_script.sh#shell_script.sh</file>
</shell>
<ok to="pig_script_action"/>
<error to="kill"/>
</action>

Oozie shell action issue while creating directories

I am unable to add/delete any files or directories on HDFS from a shell script which I am executing from Oozie workflow.
The username is "scitest" and the hdfs path I am trying to edit/add/delete is
/user/scitest/.
In the shell script I am trying to delete a folder named test123456 from the path /user/scitest/.
---------------Error from oozie log------------------
429737-oozie-oozi-W#shell-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
2016-12-27 05:04:25,553 INFO ActionEndXCommand:520 - SERVER[vscihadoopvm2.manhdev.com] USER[scitest] GROUP[-] TOKEN[] APP[shell.workflow] JOB[0000041-161208225429737-oozie-oozi-W] ACTION[0000041-161208225429737-oozie-oozi-W#shell-node] ERROR is considered as FAILED for SLA
---------shell-script(sample.sh) content----------
#!/bin/bash
echo "`date` hi" > output.log
hadoop fs -mkdir test123456
-------job.properties---------
nameNode=hdfs://vscihadoopvm1.manhdev.com:8020
jobTracker=vscihadoopvm2.manhdev.com:8050
master=yarn-cluster
#user.name=yarn
queueName=default
examplesRoot=oozietest
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}
---workflow.xml---
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="shell.workflow">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>sample.sh</exec>
<file>sample.sh#sample.sh</file>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Error in Shell.Please refer the Oozie Logs</message>
</kill>
<end name="end"/>
</workflow-app>
#Abhiroy, Maybe its better if you can check with which user your oozie action is getting executed. you can simply place 'id' without quotes in your sample shell script and run the workflow. Then you can trace the oozie job logs to see the container executer for your sample script that oozie is using. Then we can start tracing if you have any permission issues.

Oozie work flow error

I am trying to write a simple Oozie workflow to execute a simple hive script. The hive script is executing fine without any issue, but getting issue with work flow execution.
The workflow.xml is as below:
<workflow-app xmlns="uri:oozie:workflow:0.4" name="nielsen_dma_xref_intr_dly_load_wf">
<credentials>
<credential name="hive2_cred" type="hive2">
<property>
<name>hive2.jdbc.url</name>
<value>${hive2_jdbc_uri}</value>
</property>
<property>
<name>hive2.server.principal</name>
<value>${hive2_server_principal}</value>
</property>
</credential>
</credentials>
<start to="nielsen_dma_xref_intr_dly_load_wf_start"/>
<action name="hive_load_nielsen_dma_xref_oozie" cred='hive2_cred'>
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${hiveSiteXML}</job-xml>
<jdbc-url>${jdbc_hive}</jdbc-url>
<password>${password}</password>
<script>${scripts}/nielsen_dma_xref_load.hql</script>
<param>db_dbname_dbname=${db_dbname}</param>
</hive2>
<ok to="hive_load_nielsen_dma_xref_oozie"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="nielsen_dma_xref_intr_dly_load_wf_completed"/>
</workflow-app>
All files are placed under /user/kchandr2/wf folder in HDFS
I am executing the wrokflow using the command oozie job -oozie http://<<hostname>>:11000/oozie -config /home/kchandr2/wf/nielsen_dma_xref_intr_dly_load_wf.properties -run -verbose >> /home/kchandr2/wf/Logs/nielsen_dma_xref_intr_dly_load_wf_$(date '+%Y%m%d').log 2>&1 where the properties file is placed in the local directory /home/kchandr2/wf
I am getting the Error: E0706 : E0706: Node cannot transition to itself node [hive_load_nielsen_dma_xref_oozie]
In your hive2 action
<action name="hive_load_nielsen_dma_xref_oozie" cred='hive2_cred'>
you have an "ok" transition to the very same action node
<ok to="hive_load_nielsen_dma_xref_oozie"/>
This is not allowed. That's why you get the error:
Error: E0706 : E0706: Node cannot transition to itself node [hive_load_nielsen_dma_xref_oozie]
Oozie Workflows are DAGs (Direct Acyclic Graph). You cannot have loops.

Oozie workflow parameter not getting set from coordinator

I have the following workflow XML and Coordinator XML, both created through Hue Oozie Editor.
<workflow-app name="demo8" xmlns="uri:oozie:workflow:0.4">
<start to="cds4"/>
<action name="cds4">
<fs>
<mkdir path='${nameNode}/my/path/towritefile/${wf:conf(DATE)}'/>
</fs>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The Coordinator conf is below
<coordinator-app name="Demo4CoordinatorNew"
frequency="${coord:minutes(5)}"
start="2015-01-18T18:15Z" end="2015-01-19T10:46Z" timezone="US/Pacific"
xmlns="uri:oozie:coordinator:0.2">
<controls>
<concurrency>1</concurrency>
<execution>FIFO</execution>
</controls>
<action>
<workflow>
<app-path>${wf_application_path}</app-path>
<configuration>
<property>
<name>DATE</name>
<value>${coord:formatTime(coord:nominalTime(), 'yyyy-MM-dd')}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
I have executed the Coordinator. The value that is passed for DATE parameter is blank. Do you see any issue?
In short, Im trying to create a folder in HDFS based on the Time at which the workflow is triggered.
I also tried
<mkdir path='${nameNode}/my/path/towritefile/${wf:conf("DATE")}'/>
When I do this, it gives an error.
In the workflow, replace ${wf:conf(DATE)} by ${DATE} that way it will be parameterized correctly.

How to make Hue - Oozie workflow run a java job which has config file?

I have a buildModel.jar, and a folder "conf" which contain a configuration file named config.properties.
The command line running it look like this:
hadoop jar /home/user1/buildModel.jar -t fp-purchased-products -i hdfs://Hadoop238:8020/user/user2/recommend_data/bought_together
After doing some analyze, it use the db information in "config.properties" file to store data to a mongo db.
Now i need to run it with Hue Oozie workflow, so I used Hue to upload the jar file and folder "conf" to hdfs then created a workflow. I also added "config.properties" file in workflow
This is the workflow.xml
<workflow-app name="test_service" xmlns="uri:oozie:workflow:0.4">
<start to="run_java_file"/>
<action name="run_java_file">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>xxx.xxx.recommender.buildModel.Application</main-class>
<arg>-t=fp-purchased-products</arg>
<arg>-i=hdfs://Hadoop238:8020/user/user2/recommend_data/bought_together</arg>
<file>/user/user2/service/build_model/conf/config.properties#config.properties</file>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
And this is the workflow-metadata.json
{"attributes": {"deployment_dir": "/user/hue/oozie/workspaces/_user2_-oozie-31-1416890719.12", "description": ""}, "nodes": {"run_java_file": {"attributes": {"jar_path": "/user/user2/service/build_model/buildModel.jar"}}}, "version": "0.0.1"}
After doing analyze, it got error when save data to mongo db. It seem that the java file can't see the config.properties.
Can anyone guide me how to use Hue Oozie run java which has config file ?
Sorry for late answer.
As Romain explained above. Hue will copy the config.properties to the same directory with the BuildModel.jar. So i changed the code to let BuildModel.jar read config file at the same directory. It worked !

Resources