How to pass an empty parameter to oozie coordinator - oozie

I am passing a variable ${prefix} to the oozie coordinator. For testing, I want to pass test and for production I want to pass an empty string.
How do I achieve that?
<coordinator-app name="${prefix}Summary"${start}" end="${end}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.1">

Related

How to modify DAG parameter that has a default value when triggering a DAG manually

I am interested in using a parameter when triggering a dag manually with https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html#passing-parameters-when-triggering-dags.
In my case, the argument would be days_of_data, and it should be 7 unless we pass the argument as JSON in the manual triggering. So, we could manually trigger the dag and if no parameter is passed, its value would be 7 anyway.
First, make sure that the argument days_of_data is a templated field in the operator you are calling. After that you just have to set a default value in the operator as follow:
"{{ dag_run.conf['days_of_data'] or 7 }}"
This will set days_of_data as 7 unless you pass the following JSON when executing manually a DAG (either from the CLI or the UI):
{"days_of_data": days}
Where x can be any value. Please note that this parameter would be a string, so you may need to convert it to int or another type before using it.

Airflow - How to pass data the output of one operator as input to another task

I have a list of http endpoints each performing a task on its own. We are trying to write an application which will orchestrate by invoking these endpoints in a certain order. In this solution we also have to process the output of one http endpoint and generate the input for the next http enpoint. Also, the same workflow can get invoked simultaneously depending on the trigger.
What I have done until now,
1. Have defined a new operator deriving from the HttpOperator and introduced capabilities to write the output of the http endpoint to a file.
2. Have written a python operator which can transfer the output depending on the necessary logic.
Since I can have multiple instances of the same workflow in execution, I could not hardcode the output file names. Is there a way to make the http operator which I wrote to write to some unique file names and the same file name should be available for the next task so that it can read and process the output.
Airflow does have a feature for operator cross-communication called XCom
XComs can be “pushed” (sent) or “pulled” (received). When a task pushes an XCom, it makes it generally available to other tasks. Tasks can push XComs at any time by calling the xcom_push() method.
Tasks call xcom_pull() to retrieve XComs, optionally applying filters based on criteria like key, source task_ids, and source dag_id.
To push to XCOM use
ti.xcom_push(key=<variable name>, value=<variable value>)
To pull a XCOM object use
myxcom_val = ti.xcom_pull(key=<variable name>, task_ids='<task to pull from>')
With bash operator , you just set xcom_push = True and the last line in stdout is set as xcom object.
You can view the xcom object , hwile your task is running by simply opening the tast execution from airflow UI and clicking on the xcom tab.

How to loop workflows with a counter in Oozie using HUE 3.11?

I have a workflow that starts with a shell script node that accepts a numeric parameter and it directs into different hive scripts using this parameter. How do I loop this workflow so it would execute based on a range of number as the parameter?
What I do right now is I change parameter in the GUI, execute, wait for it to finish, then change the parameter for the next number and rerun again.
You can achieve this using sub-flow, read the following blog to understand how to implement http://www.helmutzechmann.com/2015/04/23/oozie-loops/
The shell action output can be captured and accessed by the other action
${wf:actionData('shellAction')['variablename']}
Hope this helps.
-Ravi

pass a directory name from one coordinator to another in oozie

I have a coordinator-A running which has workflow that generates output to a directory
/var/test/output/20161213-randomnumber/
now i need to pass the dir name "20161213-randomnumber" to another coordinator-B which needs to start as soon as the workflow of the coordinator-A is completed.
I am not able to find any pointers on how to pass the file name or how can the coordinator-B be triggered with the directory generated by co-ordinator A.
How ever i have seen numerous examples on triggering the coordinators for a specific date, daily, monthly, weekly dataset. In my case the dataset is not time dependent. It can arrive arbitrarily .
In your case, you can add one more action to put one empty trigger file (trig.txt) after your data generation script(/var/test/output/20161213-randomnumber/) action in your coordinator A. Then in your coordinator B add the data dependency to point to the trigger file, if it is there corrdinator B will start. Once B is getting started you can clear the trigger file for the next run.
You can use this data dependency to solve the problem. you can not pass parameter from one coordinator to another coordinator.

How to change value in an oozie job coordinator?

I have a mapreduce job which is scheduled by an oozie coordinator and runs every 4 hours. This mapreduce job takes a parameter, let's say k, whose value is set in the job.config file. I'd like to know if I change the value of this parameter between two runs, does it pick the updated (new) value or it sticks to the original (old) value?
if the job is in runing mode, it will stick to Old parameter it self, and if the job is in waiting to schedule run, then it will take the latest value :).
Actually, there is a devious way to "dynamically" fetch a parameter value at run time:
insert a dummy Shell Action at the beginning of the Workflow, with
<capture-output/> option set
in the shell script, just download a properties file from HDFS and
dump it to STDOUT
the "capture-output" option tells Oozie to parse STDOUT into a Map (i.e. a key/value list)
then use an E.L. function to retrieve the appropriate value(s) in
the next Actions
${wf:actionData("DummyShellAction")["some.key"]}
http://oozie.apache.org/docs/4.0.0/WorkflowFunctionalSpec.html#a4.2.6_Hadoop_Jobs_EL_Function

Resources