How to change value in an oozie job coordinator? - oozie

I have a mapreduce job which is scheduled by an oozie coordinator and runs every 4 hours. This mapreduce job takes a parameter, let's say k, whose value is set in the job.config file. I'd like to know if I change the value of this parameter between two runs, does it pick the updated (new) value or it sticks to the original (old) value?

if the job is in runing mode, it will stick to Old parameter it self, and if the job is in waiting to schedule run, then it will take the latest value :).

Actually, there is a devious way to "dynamically" fetch a parameter value at run time:
insert a dummy Shell Action at the beginning of the Workflow, with
<capture-output/> option set
in the shell script, just download a properties file from HDFS and
dump it to STDOUT
the "capture-output" option tells Oozie to parse STDOUT into a Map (i.e. a key/value list)
then use an E.L. function to retrieve the appropriate value(s) in
the next Actions
${wf:actionData("DummyShellAction")["some.key"]}
http://oozie.apache.org/docs/4.0.0/WorkflowFunctionalSpec.html#a4.2.6_Hadoop_Jobs_EL_Function

Related

Check value of a variable while R session is running

Is there a way to check the value of a variable in the script while it is running? One way could be to setup print commands in the script to print the value of the variable periodically. But I forgot to do that and it is a long program. Is there any another way?

Airflow - How to pass data the output of one operator as input to another task

I have a list of http endpoints each performing a task on its own. We are trying to write an application which will orchestrate by invoking these endpoints in a certain order. In this solution we also have to process the output of one http endpoint and generate the input for the next http enpoint. Also, the same workflow can get invoked simultaneously depending on the trigger.
What I have done until now,
1. Have defined a new operator deriving from the HttpOperator and introduced capabilities to write the output of the http endpoint to a file.
2. Have written a python operator which can transfer the output depending on the necessary logic.
Since I can have multiple instances of the same workflow in execution, I could not hardcode the output file names. Is there a way to make the http operator which I wrote to write to some unique file names and the same file name should be available for the next task so that it can read and process the output.
Airflow does have a feature for operator cross-communication called XCom
XComs can be “pushed” (sent) or “pulled” (received). When a task pushes an XCom, it makes it generally available to other tasks. Tasks can push XComs at any time by calling the xcom_push() method.
Tasks call xcom_pull() to retrieve XComs, optionally applying filters based on criteria like key, source task_ids, and source dag_id.
To push to XCOM use
ti.xcom_push(key=<variable name>, value=<variable value>)
To pull a XCOM object use
myxcom_val = ti.xcom_pull(key=<variable name>, task_ids='<task to pull from>')
With bash operator , you just set xcom_push = True and the last line in stdout is set as xcom object.
You can view the xcom object , hwile your task is running by simply opening the tast execution from airflow UI and clicking on the xcom tab.

How to loop workflows with a counter in Oozie using HUE 3.11?

I have a workflow that starts with a shell script node that accepts a numeric parameter and it directs into different hive scripts using this parameter. How do I loop this workflow so it would execute based on a range of number as the parameter?
What I do right now is I change parameter in the GUI, execute, wait for it to finish, then change the parameter for the next number and rerun again.
You can achieve this using sub-flow, read the following blog to understand how to implement http://www.helmutzechmann.com/2015/04/23/oozie-loops/
The shell action output can be captured and accessed by the other action
${wf:actionData('shellAction')['variablename']}
Hope this helps.
-Ravi

pass a directory name from one coordinator to another in oozie

I have a coordinator-A running which has workflow that generates output to a directory
/var/test/output/20161213-randomnumber/
now i need to pass the dir name "20161213-randomnumber" to another coordinator-B which needs to start as soon as the workflow of the coordinator-A is completed.
I am not able to find any pointers on how to pass the file name or how can the coordinator-B be triggered with the directory generated by co-ordinator A.
How ever i have seen numerous examples on triggering the coordinators for a specific date, daily, monthly, weekly dataset. In my case the dataset is not time dependent. It can arrive arbitrarily .
In your case, you can add one more action to put one empty trigger file (trig.txt) after your data generation script(/var/test/output/20161213-randomnumber/) action in your coordinator A. Then in your coordinator B add the data dependency to point to the trigger file, if it is there corrdinator B will start. Once B is getting started you can clear the trigger file for the next run.
You can use this data dependency to solve the problem. you can not pass parameter from one coordinator to another coordinator.

How to prevent sbt from running a task multiple times in a session?

I'd like to prevent the following task from getting run multiple times when sbt is running:
val myTask = someSettings map {s => if !s.isDone doSomethingAndSetTheFlag}
So what's expected would be when myTask is run for the first time, isDone is false and something gets done in the task, and then the task sets the flag to true. But when the task is run for the second time, since the isDone flag is true, it skips the actual execution block.
The expected behavior is similar to compile -> when source is compiled, the task doesn't compile the code again the next time it's triggered until watchSource task says the code has been changed.
Is it possible? How?
This is done by sbt, a task will be evaluated only once within a single run. If you want to have a value evaluated once, at the project load time, you can change it to be a SettingKey.
This is documented in the sbt documentation (highlighting is mine):
As mentioned in the introduction, a task is evaluated on demand. Each
time sampleTask is invoked, for example, it will print the sum. If the
username changes between runs, stringTask will take different values
in those separate runs. (Within a run, each task is evaluated at
most once.) In contrast, settings are evaluated once on project load
and are fixed until the next reload.

Resources