Airflow - trigger a specific task from external - airflow

In Apache Airflow, let's say I want to set up a DAG that has 3 tasks.
Task A
Task B
Task C
When the DAG gets scheduled, I want task A to run, followed by Task B (when A completes).
However, I want task C to only run when some external code triggers it (and it shouldn't poll and wait for an external condition to be satisfied; I want it to wait until it receives an external request to start execution, but only if A and B are completed).
I also don't want to create another DAG for task C.
Is this possible? How to set up please? Will it require another task between B and C?
Thanks for any advice on how this can be achieved.

The best way to achieve this is to have two tasks leading into task C, so for example:
A >> [B, x] >> C
Where x is a new task. Then you can set x to be your other trigger condition from somewhere external.
For example if you're waiting for a certain file to be delivered for C to execute, create x as a BranchOperator, and only return C from it if the file exists.

Related

If my dag fails for some reason, is it possible to re run the dag without loosing the progress?

Let's assume my dag converts a large data set from CSV format to parquet. While running the dag, for some reason my dag fails, is it possible to restore the progress when I re run the dag?
It shouldn't start from scratch after I re run the dag.
Airflow dag is a collection of tasks, organized in a way that reflects their relationships and dependencies. So if you have a dag with 3 tasks: A -> B -> C, when the task C fails, you can just re run it without re running A and B, But if you re run the dag, that means you re run the task A with all the downstream tasks (B and C).
If you want to restore the progress within a task, you can do that based on your job logic but this is not related to airflow, it depends only on the techno you use and the logic you want to implement. For example, for your data, if you have multiple files in the dataset, you can create a state store on cloud storage or a database, to know the processing state for each file, and if the file is already processed, you can skip the processing and pass to the next one.

Autosys - Release Job Dependency

I'm looking for a solution for the below scenario. I'm not sure if it's even possible or not.
I've been working on CA Workload automation D series now, there is a release dependency option available in that tool, but when I try to relate the same with Autosys, I'm not able to find a solution.
So please check and help me with this.
Lets say there are 3 jobs in a box A,B,C
A
B runs on Success of A
C runs on Success of B.
A is in RU state.
B is Active & waiting for SUCCESS(A),
C is Active and waiting for SUCCESS(B).
Now is there any way we can let the job C start executing? (only for this run)
We still want B to execute, but for now we don't want C to wait for SUCCESS of B.

Set multi dag dependency in airflow

I have 3 dags A, B and C. Dag C should get triggered only after tasks in dag A and B completes. Is there a way to implement this in airflow? I am able to set dependency between dag A and C using Triggerdagrun Operator. But when I try to set dependency between dag B and C, C is getting triggered when either A or B completes.
Can someone please help me in solving this?
I understand that explains external task sensor Operator can be used. But it continuously polls if task in dag A and B is complete which might create performance hit over a period of time.
You could set two more wait-task in your dagc,
then startop >> [wait-daga, wait-dagb] >> dagc.
Below is the link to airflow doc:
https://airflow.apache.org/docs/stable/concepts.html

Airflow task run no matter what happen to downstream

I have three task in one dag.
Task A run first. Task B runs if task A is successful.
I have Task C which has run after Task B but it is not depend up Task B or Task A success or failure.
Task C needs to no matter what happen to task A and B. However, it needs to run after task A and B is completed.
Any idea ?
To have a task run after other tasks are done, but regardless of the outcome of their execution, set the trigger_rule parameter to all_done like so:
my_task = MyOperator(task_id='my_task',
trigger_rule='all_done'
See the trigger rule documentation for more options

Create Task Schedule that run task after one is finished

Currently I have a task that run every 5 minute.
what I want is to have that task rerun every time it is completed with 1 minute delay.
what I have in mind is to create multiple task, task A and task B. task B will run after task A complete and vice versa. But not sure how to execute that.
I have found a workaround for my situation. what I do is create loop for task A to run followed by task B with delay in between.

Resources