Check for Oozie Option Parameter - oozie

I have created an Oozie workflow that uses the script action. The script that it calls has two mandatory parameters and several optional parameters. What is the correct way to handle optional parameters in an Oozie workflow?
oozie job -config job.properties -run -DMandatory1 a -DMandatory2 b -DOptional1 c
I can not list the parameters in the workflow XML (Optional2, Optional3,ect) because Oozie will error out stating that the parameter does not exist. Do I need create multiple workflows and create some logic prior to calling the Oozie script that would allow for each option?

Did you try to pass empty strings as parameters value, e.g. -Doptional1 '' on the command line?
If the shell script is smart enough to ignore empty parameters (e.g. $# -ge 3 but "$3" == "" means "no parameter 3") then the result will be the same as not passing the parameter.

If the number of parameters are less then you can go for the pass some default dummy value for the optional variable. Then you can check for the same in your shell script. If the default value comes then don't use it or else use it.
Because if you do not specify any variable mentioned inn your shell or ssh action in workflow, Oozie will fail giving an EL Error as it will not be able to replace the optional variable with anything.

Related

How to add Info to custom keyword Log

In Robot Framework, when you create a custom keyword using the *** Keyword *** section of .robot file, is there a way to print an INFO message in the log file? I've tried using BuiltIn.Log keyword, but it creates a new keyword section where the INFO is written.
I want to get INFO in custom keyword this way:
Info in Keyword execution
But currently, my only option is: Info inside BuiltIn.Log definition
Is there a way to add INFO directly to my custom keyword without using Python API?
Did you try Log to console Typing text ${User} into text field 'username' like this?
To my knowledge what you are attempting, is unfortunately not doable. This way of embedding messages can be done by the robot.logger or Python's logging api - More info in the Robot Framework User Guide
However in addition to using the Log keyword, you may alleviate the need by first adding a documentation string on your keywords - the first line is always shown in the Documentation section of the keyword. Additionally by enabling Trace on the log file you'll get at least the Arguments and Return values shown on each keyword.
The Documentation is added with the [Documentation] tag similar to
Custom Keyword
[Documentation] This string is shown completely until I leave at least
... One empty row.
...
... This is shown only in the library documentation file.
And logging modes are changed with a launch option -L or --loglevel, to enable Trace mode, simply add the option when launching your robot.
robot -t TestName -s SuiteName -L TRACE .\Path\to\Tests

dynamic task id names in Airflow

I have a DAG with one DataflowTemplateOperator that can deal with different json files. When I trigger the dag I pass some parameters via {{dag_run.conf['param1']}} and works fine.
The issue I have is trying to rename the task_id based on param1.
i.e. task_id="df_operator_read_object_json_file_{{dag_run.conf['param1']}}",
it complains about only alphanumeric characters
or
task_id="df_operator_read_object_json_file_{}".format(dag_run.conf['param1']),
it does not recognise dag_run plus the alpha issue.
The whole idea behind this is that when I see at the dataflow jobs console and job has failed I know who the offender is based on param1. Dataflow Job names are based on task_id like this:
df-operator-read-object-json-file-8b9eecec
and what I need is this:
df-operator-read-object-param1-json-file-8b9eecec
Any ideas if this is possible?
There is no need to generate new operator per file.
DataflowTemplatedJobStartOperator has job_name parameter which is also templated so can be used with Jinja.
I didn't test it but this should work:
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator
op = DataflowTemplatedJobStartOperator(
task_id="df_operator_read_object_json_file",
job_name= "df_operator_read_object_json_file_{{dag_run.conf['param1']}}"
template='gs://dataflow-templates/your_template',
location='europe-west3',
)

Oozie coordinator done flag with variable in uri-template

I need a done flag with variable in uri.
According to documentation: https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html#a5.1._Synchronous_Datasets it is possible.
My uri looks like
<uri-template>
hdfs://foo:9000/app/logs/${YEAR}${MONTH}/${DAY}/${hour_int}
</uri-template>
and I defined ${hour_int} variable in Advanced settings/Workflow properties.
I get error:
E1004: Expression language evaluation error, Unable to evaluate :hdfs://foo:9000/app/logs/${YEAR}${MONTH}/${DAY}/${hour_int}:
Need help on where am going wrong ...

How does one create optional command line arguments in oozie workflow xml

Please bear in mind that I'm a complete rookie with oozie. I know that one can specify command line arguments in the oozie workflow xml by using the arg tag. I wondered how it is possible to specify an optional command line argument such that oozie will not complain that a required parameter is missing if the user doesn't specify it?
Many thanks in advance. If the information I've given is not specific enough, I can provide a concrete example when I log into my work machine tomorrow. We use apache commons CLI options to parse the options.
E.g. I want to make the following argument optional:
-e${endDateTime}
In your workflow wherever you would use ${myparam}, replace it with ${firstNotNull(wf:conf('myparam'), 'mydefaultvalue')}
In theory you should be able to use a "config-default.xml" file next to your "workflow.xml" file to give default values to the params in the workflow (see https://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html) but I couldn't get it working.

How to get and set the default output directory in Robot Framework(Ride) in Run time

I would like to move all my output files to a custom location, to a Run directory created based on Date time during Run time. The output folder by datetime is created in the TestSetup
I have function "Process_Output_files" which will move the files to the Run folder(Run1,Run2,Run3 Folders).
I have tried using the argument-d and used the function "Process_Output_files" as suite tear down to move the output files to the respective Run directory.
But I get the following error "The process cannot access the file because it is being used by another process". I know this is because the Robot Framework (Ride) is currently using this.
If I dont use the -d argument, the output files are getting saved in temp folders.
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\output.xml
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\log.html
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\report.html
My question is, Is there a way to get move the files to custom location during run time with in Robot Framework.
You can use the following syntax in RIDE (Arguments:) to create the output in newfolders dynamically
--outputdir C:/AutomationLogs/%date:~-4,4%%date:~-10,2%%date:~-7,2% --timestampoutputs
The above syntax gives you the output in below folder:
Output: C:\AutomationLogs\20151125\output-20151125-155017.xml
Log: C:\AutomationLogs\20151125\log-20151125-155017.html
Report: C:\AutomationLogs\20151125\report-20151125-155017.html
Hope this helps :)
I understand the end result you want is to have your output files in their custom folders. If this is your desire, it can be accomplished at runtime and you won't have to move them as part of your post processing. This will not work in RIDE, unfortunately, since the folder structure is created dynamically. I have two options for you.
Option 1: Use a script to kick off your tests
RIDE is awesome, but in my humble opinion, one shouldn't be using it to run ones tests, only to build and debug ones tests. Scripts are far more powerful and flexible.
Assuming you have a test, test2.txt, you wish to run, the script you use to do this could be something like:
from time import gmtime, strftime
import os
#strftime returns string representations of a date-time tuple.
#gmtime returns the date-time tuple representing greenwich mean time
dts=strftime("%Y.%m.%d.%H.%M.%S", gmtime())
cmd="pybot -d Run%s test2"%(dts,)
os.system(cmd)
As an aside, if you do intend to do post processing of your files using rebot, be aware you may not need to create intermediate log and report files. The output.xml files contain everything you need, so if you don't want to create superfluous files, use --log NONE --report NONE
Option 2: Use a listener to do post processing
A listener is a program you write that responds to events (x_start, x_end, etc). The close() event is akin to the teardown function and is the last thing called. So, assuming you have a function moveFiles() you simply need to create a listener class (myListener), define the close() method to call your moveFiles() function, and alert your test that it should report to a listener with the argument --listener myListener.
This option should be compatible with RIDE though I admit I have never tried to use listeners with the IDE.
At least you can write a custom run script that handles the moving of files after the test case execution. In this case the files are no longer used by pybot.

Resources