How to trigger multiple Oozie coordinators with a different schedule by sharing the common job.properties - oozie

I have a problem where i need to submit multiple coordinators(around 10) each with a different schedule and there is no dependency b/w each (ex: one for every 2 hrs ,12 hrs etc). I saw there is a limitation that we should have coordinator named as exactly coordinator.xml without a prefix or suffix (so that i can't differentiate). I don't want to have my code copied 10 times(maintaining folder for each coordinator) to handle this. sample folder with multiple coordinators
Did any one have similar use-case; It would be really helpful if you share your thoughts on this Thanks!

You can inject a variable
<coordinator-app name="my_app" frequency="${my_frequency}" ...
Then run oozie with -Dmy_frequency="* * * * 12" for instance

Related

dynamic task id names in Airflow

I have a DAG with one DataflowTemplateOperator that can deal with different json files. When I trigger the dag I pass some parameters via {{dag_run.conf['param1']}} and works fine.
The issue I have is trying to rename the task_id based on param1.
i.e. task_id="df_operator_read_object_json_file_{{dag_run.conf['param1']}}",
it complains about only alphanumeric characters
or
task_id="df_operator_read_object_json_file_{}".format(dag_run.conf['param1']),
it does not recognise dag_run plus the alpha issue.
The whole idea behind this is that when I see at the dataflow jobs console and job has failed I know who the offender is based on param1. Dataflow Job names are based on task_id like this:
df-operator-read-object-json-file-8b9eecec
and what I need is this:
df-operator-read-object-param1-json-file-8b9eecec
Any ideas if this is possible?
There is no need to generate new operator per file.
DataflowTemplatedJobStartOperator has job_name parameter which is also templated so can be used with Jinja.
I didn't test it but this should work:
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator
op = DataflowTemplatedJobStartOperator(
task_id="df_operator_read_object_json_file",
job_name= "df_operator_read_object_json_file_{{dag_run.conf['param1']}}"
template='gs://dataflow-templates/your_template',
location='europe-west3',
)

How to vary creation/not creation of node instances during "install" workflow?

The task is: we have a blueprint with all needed node templates described in it,
and we want to create a deployment, that includes all these nodes, but we don't want all of them to be created during the "install" workflow.
I mean, e.g. it's needed to install all nodes in created deployment, except some of them, for example, openstack instance's volume.
But we know - it may be needed to create and add volume later and we should leave the ability to do so.
As far as volume template expects some input (it's name, for example) i want to pass 'null' as input and NOT to get volume created while "install" workflow.
Solutions like to create many various blueprints, or to delete some nodes after creation - are not acceptable.
Is that possible and how it may be performed?
I appreciate all your insights
Thanks in advance!
We've got a similar sort of requirement. Our plan is to use Cloudify 3.4's scaling capability - which is supposed to be used for multiple instances, but works just as well for just 0 or 1 instances.
Supply 0 as the value for the number_of_nodes input into the blueprint below - only tested with a local cfy install (but should be fine) - and the create & start operations will not be called. To instantiate the node post-install you'd use the built-in scale workflow. Alternatively, supply 1 at install and the node will be created.
tosca_definitions_version: cloudify_dsl_1_3
imports:
- http://www.getcloudify.org/spec/cloudify/3.4.1/types.yaml
inputs:
number_of_nodes:
default: 0
node_templates:
some_vm:
type: cloudify.nodes.Root
capabilities:
scalable:
properties:
default_instances: { get_input: number_of_nodes }
max_instances: 1

Oozie workflow to run different set of tables in parallel

I have 3 different set of tables. 1st set contains 3 tables, 2nd set contains 4 tables and 3rd set contains 5 tables. Now I want all these 3 sets should start in parallel (independent of each other) in an Oozie workflow.
Can anyone suggest a sample workflow for the same.
set1 set2 set3
Job11 job21 job31
job12 job22 job32
job13 job23 job33
job24 job34
job35
I want a workflow setup in such a way that if any job fails as part of any for any one set, then I want the other sets to continue and shouldn't fail or wait because of the failed set.
You can use the Fork and Join control nodes from Oozie workflow. If you want to execute actions for all the tables in parallel, write action for each table and add them into the Fork node OR If you want to parallelize it set basis, you can do it that way.
Sample Workflow from Apache Oozie Documentation:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.5">
...
<fork name="[FORK-NODE-NAME]">
<path start="[NODE-NAME]" />
...
<path start="[NODE-NAME]" />
</fork>
...
<join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
...
</workflow-app>
You can find more information here: Fork and Join Control Nodes
What you can do is create 3 separate workflows and set both the
<error to=""> and <ok to=""> parameter of each job action to the next job, simply ignoring any errors and moving through the jobs.
To get the workflows to run in parallel you can use the fork as specified here: Oozie fork specification or if you have Falcon installed simply make 3 Falcon processes that are scheduled at the same time. This should provide you with the functionality you need.
The only issue with this is that you have no way of really tracking whether any of the jobs failed or not.

How to get and set the default output directory in Robot Framework(Ride) in Run time

I would like to move all my output files to a custom location, to a Run directory created based on Date time during Run time. The output folder by datetime is created in the TestSetup
I have function "Process_Output_files" which will move the files to the Run folder(Run1,Run2,Run3 Folders).
I have tried using the argument-d and used the function "Process_Output_files" as suite tear down to move the output files to the respective Run directory.
But I get the following error "The process cannot access the file because it is being used by another process". I know this is because the Robot Framework (Ride) is currently using this.
If I dont use the -d argument, the output files are getting saved in temp folders.
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\output.xml
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\log.html
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\report.html
My question is, Is there a way to get move the files to custom location during run time with in Robot Framework.
You can use the following syntax in RIDE (Arguments:) to create the output in newfolders dynamically
--outputdir C:/AutomationLogs/%date:~-4,4%%date:~-10,2%%date:~-7,2% --timestampoutputs
The above syntax gives you the output in below folder:
Output: C:\AutomationLogs\20151125\output-20151125-155017.xml
Log: C:\AutomationLogs\20151125\log-20151125-155017.html
Report: C:\AutomationLogs\20151125\report-20151125-155017.html
Hope this helps :)
I understand the end result you want is to have your output files in their custom folders. If this is your desire, it can be accomplished at runtime and you won't have to move them as part of your post processing. This will not work in RIDE, unfortunately, since the folder structure is created dynamically. I have two options for you.
Option 1: Use a script to kick off your tests
RIDE is awesome, but in my humble opinion, one shouldn't be using it to run ones tests, only to build and debug ones tests. Scripts are far more powerful and flexible.
Assuming you have a test, test2.txt, you wish to run, the script you use to do this could be something like:
from time import gmtime, strftime
import os
#strftime returns string representations of a date-time tuple.
#gmtime returns the date-time tuple representing greenwich mean time
dts=strftime("%Y.%m.%d.%H.%M.%S", gmtime())
cmd="pybot -d Run%s test2"%(dts,)
os.system(cmd)
As an aside, if you do intend to do post processing of your files using rebot, be aware you may not need to create intermediate log and report files. The output.xml files contain everything you need, so if you don't want to create superfluous files, use --log NONE --report NONE
Option 2: Use a listener to do post processing
A listener is a program you write that responds to events (x_start, x_end, etc). The close() event is akin to the teardown function and is the last thing called. So, assuming you have a function moveFiles() you simply need to create a listener class (myListener), define the close() method to call your moveFiles() function, and alert your test that it should report to a listener with the argument --listener myListener.
This option should be compatible with RIDE though I admit I have never tried to use listeners with the IDE.
At least you can write a custom run script that handles the moving of files after the test case execution. In this case the files are no longer used by pybot.

Autosys file watcher for a particular filename on Windows

I am trying to write a file watcher job in autosys that would watch out for a particular file. The file name format would be filename_ddmmyyyy.
The requirement is that the file comes at 7.15am everyday and the file watcher job starts running at 6.50am and the runs till 8am. If the file is received by then, job is successful else an alert is raised.
Now what I am trying to do is to watch out the file filename_ddmmyyyy for a particular day. e.g. if today is 22nd Feb 2013, the file name will be filename_22022013 and this is the file that I am looking for. If I use wildcards like filename_*, it would look for all possible files which I don't want.
I am not sure how to do this in Windows.
Any help would be much appreciated.
Let me know in case of questions.
You will need to use the profile job attribute to initalize variables when the job starts. One of these variables will need to be the date pattern you are looking for (you'll need another process that outputs that dynamically). Then once you set it to a variable in your profile script, you can refer to that variable name from within the watch_file attribute.
Create global variable as variable with date and us that variable :
example:filename_$${GV_DATE}
GV_DATE: ddmmyyyy
Pretty late to answer, but here is an answer without using global variable. You can use formatted system date variable in the file name.
File_to_watch: filename_%date:~10,4%%date:~4,2%%date:~7,2%

Resources