Oozie workflow to run different set of tables in parallel

Oozie workflow to run different set of tables in parallel - oozie

I have 3 different set of tables. 1st set contains 3 tables, 2nd set contains 4 tables and 3rd set contains 5 tables. Now I want all these 3 sets should start in parallel (independent of each other) in an Oozie workflow.
Can anyone suggest a sample workflow for the same.
set1 set2 set3
Job11 job21 job31
job12 job22 job32
job13 job23 job33
job24 job34
job35
I want a workflow setup in such a way that if any job fails as part of any for any one set, then I want the other sets to continue and shouldn't fail or wait because of the failed set.

You can use the Fork and Join control nodes from Oozie workflow. If you want to execute actions for all the tables in parallel, write action for each table and add them into the Fork node OR If you want to parallelize it set basis, you can do it that way.
Sample Workflow from Apache Oozie Documentation:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.5">
...
<fork name="[FORK-NODE-NAME]">
<path start="[NODE-NAME]" />
...
<path start="[NODE-NAME]" />
</fork>
...
<join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
...
</workflow-app>
You can find more information here: Fork and Join Control Nodes

What you can do is create 3 separate workflows and set both the
<error to=""> and <ok to=""> parameter of each job action to the next job, simply ignoring any errors and moving through the jobs.
To get the workflows to run in parallel you can use the fork as specified here: Oozie fork specification or if you have Falcon installed simply make 3 Falcon processes that are scheduled at the same time. This should provide you with the functionality you need.
The only issue with this is that you have no way of really tracking whether any of the jobs failed or not.

Related

Is there a way to generate separate TestExecution files when using multiple threads?

I am attempting to write a tool that will automate the generation of a visual studio test playlist based on failed tests from the spec flow report, we recently increased our testThreadCount to 4 and when using the LivingDocumentation plugin to generate the TestExecution.json file it is only generating a result for 1 in 4 tests and I think this is due to the threadCount so 4 tests are being seen as a single execution.
My aim is to generate a fully qualified test name for each of the failed tests using the TestExecution file but this will not work if I am only generating 25% of the results. Could I ask if anyone has an idea of a workaround for this?
<Execution stopAfterFailures="0" testThreadCount="4" testSchedulingMode="Sequential" retryFor="Failing" retryCount="0" />
This is our current execution settings in the .srprofile

We made this possible with the latest version of SpecFlow and the SpecFlow+ LivingDoc Plugin.
You can configure the filename for the TestExecution.json via specflow.json.
Here is an example:
{
"livingDocGenerator": {
"enabled": true,
"filePath": "TestExecution_{ProcessId}_{ThreadId}.json"
}
}
ProcessId and ThreadId will be replaced with values and you get for every thread a separate TestExecution.json.
You can then give the livingdoc CLI tool or the Azure DevOps task a list of TestExecution.jsons.
Example:
livingdoc test-assembly BookShop.AcceptanceTests.dll -t TestExecution*.json
This generates you one LivingDoc with all the test execution results combines.
Documentation links:
https://docs.specflow.org/projects/specflow-livingdoc/en/latest/LivingDocGenerator/Setup-the-LivingDocPlugin.html
https://docs.specflow.org/projects/specflow-livingdoc/en/latest/Guides/Merging-Multiple-test-results.html

How to trigger multiple Oozie coordinators with a different schedule by sharing the common job.properties

I have a problem where i need to submit multiple coordinators(around 10) each with a different schedule and there is no dependency b/w each (ex: one for every 2 hrs ,12 hrs etc). I saw there is a limitation that we should have coordinator named as exactly coordinator.xml without a prefix or suffix (so that i can't differentiate). I don't want to have my code copied 10 times(maintaining folder for each coordinator) to handle this. sample folder with multiple coordinators
Did any one have similar use-case; It would be really helpful if you share your thoughts on this Thanks!

You can inject a variable
<coordinator-app name="my_app" frequency="${my_frequency}" ...
Then run oozie with -Dmy_frequency="* * * * 12" for instance

Is there a way to use config-default.xml globally in Oozie?

From the documentation, config-default.xml must be presented in the workflow workspace.
- /workflow.xml
- /config-default.xml
|
- /lib/ (*.jar;*.so)
The problem
I've created a custom Oozie action and try to add default values for retry-max and retry-interval to all the custom actions.
So my workflow.xml will look like this:
<workflow-app xmlns="uri:oozie:workflow:0.3" name="wf-name">
<action name="custom-action" retry-max="${default_retry_max}" retry-interval="${default_retry_interval}">
</action>
config-default.xml file contains the values of default_retry_max and default_retry_interval.
What I've tried
Putting config-default.xml to every workflow workspace. This works, but the problem is there will be this file everywhere.
Setting oozie.service.LiteWorkflowStoreService.user.retry.max and oozie.service.LiteWorkflowStoreService.user.retry.inteval also works, but it would affect all action types.
I've also looked at Global Configurations, but it doesn't solve this problem.
I think there should be a way to put config-default.xml to oozie.libpath and only those workflows that use this libpath will be affected.

AFAIK, there is unfortunately no clean way to do it.
You might be interested in this recently created feature request: https://issues.apache.org/jira/browse/OOZIE-3179
The only thing that worked for me was to begin the workflow with a shell step that uses a script stored in hdfs. This script holds the centralized configuration. The script would look like this:
#!/bin/sh
echo "oozie.use.system.libpath=true"
echo "hbase_zookeeper_quorum=localhost"
.. etc other system or custom variables ..
Yes, the script simply prints the variables to the stdout.
Let's say the shell step action is called "global_config". All following steps are able to get the variables using following syntax:
${wf:actionData('global_config')['hbase_zookeeper_quorum']}
HTH...

How to vary creation/not creation of node instances during "install" workflow?

The task is: we have a blueprint with all needed node templates described in it,
and we want to create a deployment, that includes all these nodes, but we don't want all of them to be created during the "install" workflow.
I mean, e.g. it's needed to install all nodes in created deployment, except some of them, for example, openstack instance's volume.
But we know - it may be needed to create and add volume later and we should leave the ability to do so.
As far as volume template expects some input (it's name, for example) i want to pass 'null' as input and NOT to get volume created while "install" workflow.
Solutions like to create many various blueprints, or to delete some nodes after creation - are not acceptable.
Is that possible and how it may be performed?
I appreciate all your insights
Thanks in advance!

We've got a similar sort of requirement. Our plan is to use Cloudify 3.4's scaling capability - which is supposed to be used for multiple instances, but works just as well for just 0 or 1 instances.
Supply 0 as the value for the number_of_nodes input into the blueprint below - only tested with a local cfy install (but should be fine) - and the create & start operations will not be called. To instantiate the node post-install you'd use the built-in scale workflow. Alternatively, supply 1 at install and the node will be created.
tosca_definitions_version: cloudify_dsl_1_3
imports:
- http://www.getcloudify.org/spec/cloudify/3.4.1/types.yaml
inputs:
number_of_nodes:
default: 0
node_templates:
some_vm:
type: cloudify.nodes.Root
capabilities:
scalable:
properties:
default_instances: { get_input: number_of_nodes }
max_instances: 1

How to get and set the default output directory in Robot Framework(Ride) in Run time

I would like to move all my output files to a custom location, to a Run directory created based on Date time during Run time. The output folder by datetime is created in the TestSetup
I have function "Process_Output_files" which will move the files to the Run folder(Run1,Run2,Run3 Folders).
I have tried using the argument-d and used the function "Process_Output_files" as suite tear down to move the output files to the respective Run directory.
But I get the following error "The process cannot access the file because it is being used by another process". I know this is because the Robot Framework (Ride) is currently using this.
If I dont use the -d argument, the output files are getting saved in temp folders.
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\output.xml
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\log.html
c:\users\<user>\appdata\local\temp\RIDEfmbr9x.d\report.html
My question is, Is there a way to get move the files to custom location during run time with in Robot Framework.

You can use the following syntax in RIDE (Arguments:) to create the output in newfolders dynamically
--outputdir C:/AutomationLogs/%date:~-4,4%%date:~-10,2%%date:~-7,2% --timestampoutputs
The above syntax gives you the output in below folder:
Output: C:\AutomationLogs\20151125\output-20151125-155017.xml
Log: C:\AutomationLogs\20151125\log-20151125-155017.html
Report: C:\AutomationLogs\20151125\report-20151125-155017.html
Hope this helps :)

I understand the end result you want is to have your output files in their custom folders. If this is your desire, it can be accomplished at runtime and you won't have to move them as part of your post processing. This will not work in RIDE, unfortunately, since the folder structure is created dynamically. I have two options for you.
Option 1: Use a script to kick off your tests
RIDE is awesome, but in my humble opinion, one shouldn't be using it to run ones tests, only to build and debug ones tests. Scripts are far more powerful and flexible.
Assuming you have a test, test2.txt, you wish to run, the script you use to do this could be something like:
from time import gmtime, strftime
import os
#strftime returns string representations of a date-time tuple.
#gmtime returns the date-time tuple representing greenwich mean time
dts=strftime("%Y.%m.%d.%H.%M.%S", gmtime())
cmd="pybot -d Run%s test2"%(dts,)
os.system(cmd)
As an aside, if you do intend to do post processing of your files using rebot, be aware you may not need to create intermediate log and report files. The output.xml files contain everything you need, so if you don't want to create superfluous files, use --log NONE --report NONE
Option 2: Use a listener to do post processing
A listener is a program you write that responds to events (x_start, x_end, etc). The close() event is akin to the teardown function and is the last thing called. So, assuming you have a function moveFiles() you simply need to create a listener class (myListener), define the close() method to call your moveFiles() function, and alert your test that it should report to a listener with the argument --listener myListener.
This option should be compatible with RIDE though I admit I have never tried to use listeners with the IDE.

At least you can write a custom run script that handles the moving of files after the test case execution. In this case the files are no longer used by pybot.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Oozie workflow to run different set of tables in parallel - oozie

Related

Is there a way to generate separate TestExecution files when using multiple threads?

How to trigger multiple Oozie coordinators with a different schedule by sharing the common job.properties

Is there a way to use config-default.xml globally in Oozie?

How to vary creation/not creation of node instances during "install" workflow?

How to get and set the default output directory in Robot Framework(Ride) in Run time

Categories

Resources