duplicate java actions executed by oozie launcher - oozie

I am facing below issue with oozie-5.0.0.
My oozie workflow has a java action which is getting executed twice by the same oozie launcher.
I found that this is happening as oozie launcher is allocated 2 yarn containers and each container invokes the java action thereby executing duplicate actions.
Would just like to know how to avoid duplicate actions from getting executed.

Related

Google Cloud RunTask before its scheduled to run

When using Google Cloud Tasks, how can i prematurely run a tasks that is in the queue. I have a need to run the task before it's scheduled to run. For example the user chooses to navigate away from the page and they are prompted. If they accept the prompt to move away from that page, i need to clear the queued task item programmatically.
I will be running this with a firebase-function on the backend.
Looking at the API for Cloud Tasks found here it seems we have primitives to:
list - get a list of tasks that are queued to run
delete - delete a task this is queued to run
run - forces a task to run now
Based on these primitives, we seem to have all the "bits" necessary to achieve your ask.
For example:
To run a task now that is scheduled to run in the future.
List all the tasks
Find the task that you want to run now
Delete the task
Run a task (now) using the details of the retrieved task
We appear to have a REST API as well as language bound libraries for the popular languages.

How Can You Run CRaSH Commands Or A Script On Node Startup?

I need to initialise my Corda nodes by running a few flows to create certain states.
At the moment I am doing it via the CRaSH shell.
e.g.
flow start IOUFlow iouValue: 50, counterparty: Bank1
Is it possible to have the node run a script or some commands on node startup to do this automatically?
If not, how can I write a bash script to automate these CRaSH commands?
Corda 4.4 introduces a new feature to register actions to be performed on node startup.
You could register an action to be performed on node startup using a CordaService.
appServiceHub.register(
AppServiceHub.SERVICE_PRIORITY_NORMAL,
event -> {
// Your custom code to be run on startup.
}
);
You might want to check on the event type to keep it future proof, but currently the ServiceLifecycleEvent just has a single STATE_MACHINE_STARTED enum.

Oozie JA009 error on sub-action

I experience the following situation: I have created and launched an Oozie workflow comprising of two hive actions. Moments after the wf starts, the first action gets JA009 error and the wf is marked as SUSPENDED. Now the interesting part: the first action actually continues running and succeeds, although marked with the above error; in this moment the wf is stuck, not passing to the second action.
Any ideas on how to debug this?
Err msg:
JA009: Cannot initialize Cluster. Please check your configuration for
mapreduce.framework.name and the correspond server addresses.
Env info:
Oozie 4.2.0.2.5.3.0-37
Hadoop 2.7.3.2.5.3.0-37
Hive 1.2.1000.2.5.3.0-37

Get application id of spawned oozie job

When an oozie launcher spawns another hadoop job, is there any way to get the application ID, or even better the resource manager link, to that spawned application? It seems like the oozie launcher is only aware of its own id.
This is with a Spark action.
You can use below inbuilt oozie EL function to get the application Id.
wf:actionExternalId(String node)
More details on available EL functions here: http://oozie.apache.org/docs/3.3.0/WorkflowFunctionalSpec.html#a4.2_Expression_Language_Functions

Oozie Where does a custom EL function execute

I am writing a custom EL function which will be used in oozie workflows.
this custom function is just plain java code it doesn't contain any hadoop code.
My question is where will this EL function be executed at the time the workflow is running?
Will it execute my EL function on the Oozie node itself? or will it push my custom java code to one of the data nodes and execute it there?
Oozie is a workflow scheduler system to manage jobs in Hadoop Cluster it self, which integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Source
Which means if you submit a Job in Oozie, it will run in any of the available DataNode it self, even if your Oozie Service is configured in Datanode then it can run there as well.
For checking which Node the Job is processing, you have to check the same from JobTracker in Hadoop1 or Yarn in Hadoop2 which redirect the Process State to the Tasktracker node where the Job is being process
Acording to Apache Oozie: The Workflow Scheduler for Hadoop, page 177, it states:
It is highly recommended that the new EL function be simple, fast and
robust. This is critical because Oozie executes the EL functions on
the Oozie server
So It will be executed on your Oozie node itself.

Resources