To run batch jobs one after the other

To run batch jobs one after the other - axapta

I submitting jobs to the batch process one after the other.
How do i control such that the second batch job runs only when the first one is finished.
Right now both the jobs executes simultaneously which i dont want to happen

There are two options. You can do this through code, or just via manual setup. Manual method is fairly easy, just go to (Basic>Inquiries>Batch Job), create a new batch job and save it. Then click "View Tasks" and create a new task, where this will be your first batch task. Choose your class, description, batch group, etc., then save. Click "parameters" to setup the parameters.
After that, you can setup your dependent task. Make sure your tasks both have descriptions. Add your second batch task and save. Then in the lower left corner, you click on your task that you want to have a condition, then add a row there and setup your conditions so that one task won't go until the second has completed.
Via X++ code, you would create a BatchHeader where you setup basically the same thing we just did manually. You use the .addDependency to make one task dependent on the completion of the other. This walkthrough will get you started with a job to create the batch header, and you'll just have to play around to get the dependency working.

Related

Airflow dynamic dag creation

Someone please tell me whether a DAG in airflow is just a graph (like a placeholder) without any actual data (like arguments) associated with it OR a DAG is like an instance (for a fixed argument)?
I want a system where the set of operations to perform (given a set of arguments) are fixed. But this input will be different everytime the set of operations are run. In simple terms, the pipeline is the same but the arguments to the pipeline will be different everytime it is run.
I want to know how to configure this in airflow? Should I create a new DAG for every new set of arguments? or any other method?
In my case, the graph is the same but want to run it on different data (from different users) as they come. So, should I create a new DAG everytime for new data?

Yes you are correct; A DAG is basically kind off a one-way graph. You can create a DAG once by chaining together multiple operators together to form your "structure".
Each operator, can then take in multiple arguments that you can pass from the DAG definition file itself (if needed).
Or you can pass in a configuration object to the DAG, and access custom data from there using the context.
I would recommend reading the Airflow Docs for more examples: https://airflow.apache.org/concepts.html#tasks

You can think of Airflow DAG as a program made of other programs, with the exception that it can't contain loops(acyclic). Will you change your program every time input data changes? Of course, it all depends on how you write your program, but usually you'd like you program to generalise, right? You don't want two different programs to do 2+2 and 3+3. But you'll have different programs to show Facebook pages and to play Pokemon Go. If you want to do the same thing to a similar data then you want to write your DAG once, and maybe only change environment arguments(DB connection, date, etc) - Airflow is perfectly suitable for that.

You do not need to create a new DAG every time, if the structure of the graph is the same.
Airflow DAGs are created via code, so you are free to create a code structure that allows you to pass in arguments each time. How you do that will require some creative thinking.
You could, for example, create a web form that accepts the arguments, stores them in a DB and then schedules the DAG with the Airflow restAPI. The DAG code would then need to be written to retrieve params from the database.
There are several other ways to accomplish what you are asking, they all just depend on your use case.
One caveat, the Airflow scheduler does not perform well if you change the start date of the DAG. For your idea above you will need to set the start date earlier than your first DAG run and then set the schedule interval to off. This way you have a start date that doesn’t change and dynamically triggered DAG runs.

How does PeopleSoft App Engine program flow occur

I am learning more about PeopleSoft Application Engine program flow. From what I've read in PeopleBooks, any actions within a step that specify a Do Select , Do When or Do While perform a looping activity, where all subsequent Actions (within that step) are looped through one row at a time.
I have seen some App Engine programs, including the below one where a Do Select action occurs in a step, followed by a Call Section action that executes anoter section of the program. Does this mean that the loops still iterates over the called section one row at a time, just like any other action(s) would be repeated within the calling step?
My 2nd question is specific to the below App Engine program. In the highlighted PeopleCode action at the bottom of the program, you can see it runs PeopleCode to check/compare data elements and then Exit. My question is whether this code is running within the context of the looping action occuring above where it is executing one row at a time, or is this executing by looking at everything in the buffer at the same time? I would think it can only process row-by-row as it needs to correctly exit/break from the step. Hopefully my question makes sense, but I'm happy to clarify is needed. Thanks!

Both of your assumptions are correct.
If you call another program section within a Do ..., then that call gets executed once for every row that is returned from the Do .... Within the context of the called section, the data in your state tables and temp tables will the same as they were when you hit the Call Section action.
When you execute a PeopleCode action, it executes with whatever data is in the state records and temp tables at that time.

Is Airflow a good fit for DAG that doesn’t care about execution date/time?

The API in Airflow seems to suggest it is build around backfilling, catching up and scheduling to run regularly in interval.
I have an ETL that extract data on S3 with the versions of the previous node (where the data comes from) in DAG. For example, here are the nodes of the DAG:
ImageNet-mono
ImageNet-removed-red
ImageNet-mono-scaled-to-100x100
ImageNet-removed-red-scaled-to-100x100
where ImageNet-mono is the previous node of ImageNet-mono-scaled-to-100x100 and
where ImageNet-removed-red is the previous node of ImageNet-removed-red-scaled-to-100x100
Both of them go through transformation of scaled-to-100x100 pipeline but producing different data since the input is different.
As you can see there is no date is involved. Is Airflow a good fit?
EDIT
Currently, the graph is simple enough to be managed manually with less than 10 nodes. They won't run in regularly interval. But instead as soon as someone update the code for a node, I would have to run the downstream nodes manually one by one python GetImageNet.py removed-red and then python scale.py 100 100 ImageNet-removed-redand then python scale.py 100 100 ImageNet-mono. I am looking into a way to manage the graph with a way to one click to trigger the run.

I think it's fine to use Airflow as long as you find it useful to use the DAG representation. If your DAG does not need to be executed on a regular schedule, you can set the schedule to None instead of a crontab. You can then trigger your DAG via the API or manually via the web interface.
If you want to run specific tasks you can trigger your DAG and mark tasks as success or clear them using the web interface.

Control-M: it is possible if first job fails to continue running

I have several jobs than will run in sequence. It is possible to create a dependency between them only for completion, but not that the prior job has to complete successfully?
If a job fails this should remain red and go to the next job and continue running.
It is mandatory that this jobs to run in sequence and not in paralel.

As Mark outlined you can simply create an On-Do action within the parent job to add a condition when the job ends Not OK. The parent job will still go red and the successor job will kick off.
See below for an example:

yes, on the actions tab you create and On/Do step and say when Not OK the job should add the output condition. In this way the next job will run (in sequence) regardless of what happens to the predecessor job.

Control-M scheduling changes

what all changes can we make in control-m job scheduling to minimize charges if we get charged on the basis of no of jobs ordered in a day in active schedule.
This is costing us a lot.

If some of your jobs are commands and share broad characteristics (nodeid, user, no alerts) then use conditional operators. E.g. linking commands with a semicolon will mean that each command is executed once. Linking with && will mean that the second command is only executed if the first runs successfully.

As Alex has mentioned, this is a broad area. And would be very difficult to anser to the point. But below are the few tips tht can be considered.
1. Check is the same script is being run by different jobs. This can be combined together by the help of scheduling tab.
2. File watcher jobs. If there is a requirement for checking the incoming file and then trigger a specific job for processign the file. [This makes up 2 jobs: Job1 - File watcher, Job2 - File Processing] This function can be achived by using AFT jobs. AFT jobs combine this two functions in one.
3. Low priority jobs, where alerting is not required, can be moved to unix/shell scripts if jobs are costly.
4. If Job2 is succeeding Job1, and Job2 is having only 1 IN CONDITION i.e from Job1, then rather than having two jobs, script (of Job2) can be called from the script (of Job1). So, ultimately we are doing two functions in Job1. Also, if the script (Job2) fails, then the Job1 will not get success return code. And you can get the details from log.
5. Keep the archiving functionality to the scripts and no need to bring it into the Control M jobs unless it is very important. And rather than doigni it every day for past 6 months, it would be better to schedule it once in a week or once in two weeks.
6. Sort the jobs in such a way that the regular jobs are in one table and the adhoc jobs (which is run only on request) into another. Keep the 'UserDaily' only for the jobs that are regular. Not keeping the 'UserDaily' for the jobs that are adhoc will not call these jobs in to the EM daily, and thus you see only those jobs that run daily and not the one that might or moght not run daily.
Hope this helps.

You can use ctmudly command to order only the user daily you want.

You can try using crontab in unix to schedule non-priority jobs which doesn't need manual intervention or observation.
You can avoid the FW jobs by including file checker logic in your Shell script which executes the actual process.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

To run batch jobs one after the other - axapta

I submitting jobs to the batch process one after the other. How do i control such that the second batch job runs only when the first one is finished. Right now both the jobs executes simultaneously which i dont want to happen

Related

Airflow dynamic dag creation

How does PeopleSoft App Engine program flow occur

Is Airflow a good fit for DAG that doesn’t care about execution date/time?

Control-M: it is possible if first job fails to continue running

Control-M scheduling changes

Categories

Resources