Get application id of spawned oozie job - oozie

When an oozie launcher spawns another hadoop job, is there any way to get the application ID, or even better the resource manager link, to that spawned application? It seems like the oozie launcher is only aware of its own id.
This is with a Spark action.

You can use below inbuilt oozie EL function to get the application Id.
wf:actionExternalId(String node)
More details on available EL functions here: http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a4.2_Expression_Language_Functions

Related

duplicate java actions executed by oozie launcher

I am facing below issue with oozie-5.0.0.
My oozie workflow has a java action which is getting executed twice by the same oozie launcher.
I found that this is happening as oozie launcher is allocated 2 yarn containers and each container invokes the java action thereby executing duplicate actions.
Would just like to know how to avoid duplicate actions from getting executed.

Oozie launch script after Coordinator Start

I'm looking for a way to launch a custom script when a coordinator start.
So when a coordinator start the running of a job, I'd need to make for example an api call to a third party service.
Is there a way or a workaround to make this possible?
Thank you
Solution found: the key is the property oozie.wf.workflow.notification.url
add in the workflow configuration file, the following parameter
<property>
<name>oozie.wf.workflow.notification.url</name>
<value>http://server_name:8080/oozieNotification/jobUpdate?jobId=$jobId%26status=$status</value>
and create a webservice listening on this url

Oozie Where does a custom EL function execute

I am writing a custom EL function which will be used in oozie workflows.
this custom function is just plain java code it doesn't contain any hadoop code.
My question is where will this EL function be executed at the time the workflow is running?
Will it execute my EL function on the Oozie node itself? or will it push my custom java code to one of the data nodes and execute it there?
Oozie is a workflow scheduler system to manage jobs in Hadoop Cluster it self, which integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Source
Which means if you submit a Job in Oozie, it will run in any of the available DataNode it self, even if your Oozie Service is configured in Datanode then it can run there as well.
For checking which Node the Job is processing, you have to check the same from JobTracker in Hadoop1 or Yarn in Hadoop2 which redirect the Process State to the Tasktracker node where the Job is being process
Acording to Apache Oozie: The Workflow Scheduler for Hadoop, page 177, it states:
It is highly recommended that the new EL function be simple, fast and
robust. This is critical because Oozie executes the EL functions on
the Oozie server
So It will be executed on your Oozie node itself.

IBM BPM 8.5.6 Suspend task

Is it possible to suspend a task via process portal in IBM BPM 8.5.6. In the 6.2 version we were able to do this via inbox or saved searches. However the new saved searches doesn't support this? Is there a way to do this?
In v8.5 there is nothing like suspending a task from process portal. Although we can suspend a task from Admin Console .
Moreover if you want to suspend it from process portal , you can try the below way:
create a HS , having a text box for accepting PID of the process.
Use JS API , to suspend the Task for the provided PID.
var id = tw.local.pid ;
tw.system.findProcessInstanceByID(id).suspend() ;
Expose this human service as a Startable Service to the intented users.
provide the PID which you want to suspend .
As Jyoti Yadav already stated, you can suspend a task via Javascript API.
An alternate way would be using the /ProcessAdmin page. After logging in you choose the tab "Process Inspector" from the top of the page and search for your instances:
This would be a less flexible, but more human approachable way of suspending a task.
You can not suspend a task, you can suspend the instance.
This is OK if all tasks are sequential, but what if you have parallel tasks and you want to suspend only one of them.
A great alternative that i used is:
assign that task to the system user (bpmadmin, wasadmin, celladmin, deadmin,.... whatever you named it), then your instance is still active but it can not be completed until you reassign it back to a user and it finishes it.
This way you can create tasks now, and with a timer you can assign them to a user/group at a specified time.
The code i used:
tw.system.findTaskByID("123456").reassignTo("bpmadmin");
to assign it to the system user so no one can see it,
then in the timer script:
tw.system.findTaskByID("123456").reassignBackToRole();
to assign it back to the group to be executed.
You can either suspend them through the Process Inspector or via REST Api calls that BPM provides. URL can be :
https://baseroot:9443/bpmrest-ui/BPMRestAPITester/index.jsp
In answer to the follow up question, you can put it in the administration portal by exposing it as an "administration service" instead of a "human service".
https://ip:port/rest/bpm/wle/v1/process/xx?action=suspend&parts=all (put)
and
https://ip:port/rest/bpm/wle/v1/task?action=cancel&taskIDs=? (put)
You can refer to the API document of V 8.5.6 for task suspension and then use your own task ID to drop the API suspension task

Apache Mesos Workflows - Event Driven Scheduler

We are currently using Apache Mesos with Marathon and Chronos to schedule long running and batch processes.
It would be great if we could create more complex workflows like with Oozie. Say for example kicking of a job when a file appears in a location or when a certain application completes or calls an API.
While it seems we could do this with Marathon/Chronos or Singularity, there seems no readily available interface for this.
You can use Chronos' /scheduler/dependency endpoint to specify "all jobs which must run at least once before this job will run." Do this on each of your Chronos jobs, and you can build arbitrarily complex workflow DAGs.
https://airbnb.github.io/chronos/#Adding%20a%20Dependent%20Job
Chronos currently only schedules jobs based on time or dependency triggers. Other events like file update, git push, or email/tweet could be modeled as a wait-for-X job that your target job would then depend on.

Resources