I want to know if I can add pre-requisite conditions for a job based on server availability. Suppose Job J runs from job server JS, and it interacts with two other servers SERVER1 and SERVER2.
I want to configure job J such that it runs only when SERVER1 and SERVER2 are available. In case any of the two servers is down, the job should wait for servers to come back online.
I don't know if this should be a comment or an answer, but what you are looking for is not natively available within Control-M.
The simplest solution I can think for you is to configure a sleep job to run on SERVER1 and SERVER2 and have them as pre-decessors to job J. These sleep jobs will only run when the agents on SERVER1/2 are available therefore confirming server availability prior to execution of job J.
Alternatively you could write a script that loops waiting for SERVER1/2 to respond to pings then complete and configure this job as a pre-decessor to job J.
I'm still newbie in Control-M but we have implemented a solution with similar goals with a job hook to proof nodes.
Assumed, your target server (node) called JS which interacts with SERVER1 (let's call node01). Any number of servers / nodes can be added later, let's see with just one node.
Overview components:
Jobs created for monitor changes and check status while OK and NOT OK status
Quantitative resource created for each node, for example node01_run (or stacked, as you wish)
Jobs are containing quantitative resource "node01_run" with least 1 free resource
If everything ok, jobs should run as expected
If downtime is recognized, quantitative resource (QR) will be changed to 0, so affected jobs should not run,
If the node is up again, quantitative resource will be set to the original value (10, 100, 1000, ...) and the jobs should run again as usual
Job name: node01_check_resource
Job Goal ---> Check if quantitative resource already existing
Job OS Command ---> ecaqrtab LIST node01_run
Result yes ---> do nothing,
Result no ---> Job node01_create_resource, command: ecaqrtab ADD node01_run 100 (or as many as you wish)
Job name: node01_check (cyclic)
Job Goal ---> Check if node up
Job OS Command ---> As you define that you node is up: check webservice, check uptime, wmi result, ping, ...
Result up ---> rerun job in x minutes
Result no ---> go for next job
Job name: node01_up-down
Job Goal ---> Case for switching between status up and status down
Job OS Command ---> ecaqrtab UPDATE node01_run 0
On-Do action: ---> when job ended, remove condition that node01_check cannot start (as is defined as cyclic job)
Job name: node01_check_down (cyclic)
Job Goal ---> Check status while known status is down
Job OS Command ---> As defined in node01_check
Result down ---> Do nothing as job is defined as cyclic
Result up ---> Remove some contitions
Job name: node01_down-up
Job Goal ---> Switching all back to normal operation
Job OS Command ---> ecaqrtab UPDATE node01_run 100
Action ---> Add condition that node01_check can run again
You can define such job hooks for multiple nodes and you can define in each job, which nodes should be up and running (means where the quantitative resource is higher than 0). It can be monitored multiple hosts and still set the same resource - as you wish.
I hope this helps further, unless you have found a suitable solution already.
Related
I am trying to see if Airflow is the right tool for some functionality I need for my project. We are trying to use it as a scheduler for running a sequence of jobs
that start at a particular time (or possibly on demand).
The first "task" is to query the database for the list of job id's to sequence through.
For each job in the sequence send a REST request to start the job
Wait until job completes or fails (via REST call or DB query)
Go to next job in sequence.
I am looking for recommendations on how to break down the functionality discussed above into an airflow DAG. So far my approach would :
create a Hook for the database and another for the REST server.
create a custom operator that handles the start and monitoring of the "job" (steps 2 and 3)
use a sensor to poll handle waiting for job to complete
Thanks
I have an airflow instance that had been running with no problem for 2 months until Sunday. There was a blackout in a system on which my airflow tasks depend and some tasks where queued for 2 days. After that we decided it was better to mark all the tasks for that day as failed and just lose that data.
Nevertheless, now all the new tasks get trigger at the proper time but they are never being set to any state (neither queued nor running). I check the logs and I see this output:
Dependencies Blocking Task From Getting Scheduled
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
The scheduler is down or under heavy load
The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
This task instance already ran and had its state changed manually (e.g. cleared in the UI)
I get the impression the 3rd topic is the reason why it is not working.
The scheduler and the webserver were working, however I restarted the scheduler and still I am having the same outcome. I also deleted the data in mysql database for one job and it is still not running.
I also saw a couple of post that said it is not running because the depens_on_past was set to true and if the previous runs failed, the next one will never be executed. I also checked it and it is not my case.
Any input would be really apreciated.
Any ideas? Thanks
While debugging a similar issue i found this setting: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE (or http://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#max-dagruns-per-loop-to-schedule), checking the airflow code it seems that the scheduler queries for dagruns to examine (consider to run ti's for), this query is limited to that number of rows (or 20 by default). So if you have >20 dagruns that are in some way blocked (in our case because ti's were on up-for-retry), then it won't consider other dagruns even though these could run fine.
I have a task that's scheduled to run hourly, however it's not being triggered. When I look at theTask Instance Details it says:
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
- The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
- This task instance already ran and had its state changed manually (e.g. cleared in the UI)
If this task instance does not start soon please contact your Airflow administrator for assistance.
If I clear the task in the UI I am able to execute it through terminal but it does not run when scheduled.
Why do I have to manually clear it after every run?
My level of experience with the product is basic at best, but I'm expected to be a developer; I have a basic understanding of many things.
Right now my job is to investigate canceling lines in Purchase Orders. We have a workflow set up to handle those, and I'm trying to duplicate the scenario in my dev instance. Whenever a user cancels a line, the workflow is supposed to engage, and I've found that a batch job is what triggers that workflow to work (maybe that's the case with all workflows, but I don't know that for sure).
I've set up my personal Dev AX Instance under System Configuration => System => Server Configuration to use my personal Dev AOS server that my client is also running under, but when I go to System Configuration => Batch Jobs => Batch Jobs, then find the Batch Job I've been looking for and set the status to Waiting, the Batch Job never runs.
On our Test instance, the jobs is configured exactly the same way, except they use the AOS Server allotted for it.
I did a SQL script to change the batch job to use my personal Dev AOS Server, then did a restart of the Dynamics AX Servers.
There must be something I'm doing wrong for my personal dev instance. I've been reading some things from here about what may be going on and following down the list, but I'm pretty sure the problem is even stupider => https://www.daxrunbase.com/2017/07/02/troubleshooting-batch-jobs-in-ax/
First of all, do you have all 3 workflow jobs set up?
Workflow message processing
Workflow due date processing
Workflow line-item notifications
They can be set up from System administration > Setup > Workflow > Workflow infrastructure configuration.
Secondly, it is OK for the periodic batch jobs to have status Waiting. They will be in status Executing for a short time and then they will be Waiting for the next run. If the Scheduled start date/time value in this batch job is in the past, that could be a problem. Otherwise everything is OK.
Lastly, if you have already ticked the Is batch server check-box in System administration > Setup > System > Server configuration, please also make sure to move the workflow batch group in the Batch server groups section in the same form from Remaining groups to Selected groups.
The batch jobs should start at Scheduled start date/time - or a bit later, you'd need to wait a minute and refresh the grid.
as part of a batch job I create 4 command lines through control-m which invoke a legacy console application written in VB6. The console application invokes an ActiveEx server which performs a set of analytic jobs calculating outputs. The ActiveEx server was coded as a singleton but when invoked through control-m I get 4 instances running. the ActiveEx server does not tear down once the job has completed and the command line has closed it self.
I created 4 .bat files which once launced manually on the server, simulate the calls made through control-m and the ActiveEx server behaves as expected, i.e. there is only 1 instance ever running and once complete it closes down gracefully.
What am I doing wrong?
Control-M jobs are run under a service account and it same as we login as a user and execute a job. How did you test this? Did you manually executed each batch job one after another or you have executed all the batch job at the same time from different terminals? You can do one thing. Run the control-M jobs with a time interval like first one at 09.00 second one at 09.05, third one at 09.10 and forth one at 09.15 and see if that fix your issue.
Maybe your job cannot use the Desktop environment.
Check your agent service settings:
Log on As:
User account under which Control‑M Agent service will run.
Valid values:
Local System Account – Service logs on as the system account.
Allow Service to Interact with Desktop – This option is valid only if the service is running as a local system account.
Selected – the service provides a user interface on a desktop that can
be used by whoever is logged in when the service is started. Default.
Unselected – the service does not provide a user interface.
This Account – User account under which Control‑M Agent service will run.
NOTE: If the owner of any Control-M/Server jobs has a "roaming profile" or if job output (OUTPUT) will be copied to or from other computers, the Log in mode must be set to This Account.
Default: Local System Account