Apache Airflow: Conditionals running before triggered - airflow

I'm having a problem with my DAG. I want to set it up to where if one task fails, another happens and the entire run doesn't fail.
The code is proprietary so I can't post the code snippet. So sorry!
Task0 >> [Task1, Task2]
Task1 >> Task1a
If Task1 fails, I want task2 to execute. If task1 is successful, I want task1a to execute. My current code for task2 looks like this:
task2 = DummyOperator(
task_id='task2',
trigger_rule='one_failed',
dag=dag,
)
I've been playing around with the trigger_rule but this keeps running before task1. It just runs right away.

Your operator is fine. The dependency is wrong.
The Task0 >> [Task1, Task2] >> Task1a means that Task1 can run in parallel with Task2 and the trigger_rule='one_failed' of Task2 is checked against it's direct upstream tasks. This means that the rule is checked against Task0 status not against Task1.
To fix your issue you need to change:
Task0 >> Task1 >> Task2
Task1 >> Task1a

Related

Airflow terminate EMR cluster

I am using EMR cluster to run some job to run in parallel. Both of these job run in same cluster. I have put action_on_failure field to 'CONTINUE' so that if 1 task fails, the other should run in the cluster. I want end task which is EMRTerminateCluster to run after both these tasks gets completeted regardless of success or failure.
task2
task1 >> >> task4
task3
I want my dags to run in such a way that task4 only starts after task 2 and task3.
is there any way to this?

Airflow retry of multiple tasks as a group

I have a group of tasks that should run as a unit, in the sense if any of the tasks from the group fail, the whole group should be marked as failed.
I want to be able to retry the group when it has failed.
For example, I have a DAG with these tasks:
taskA >> (taskB >> taskC) >> taskD
I want to say that (taskB >> taskC) is a group.
If either taskB or taskC fails, I want to be able to rerun the whole group (taskB >> taskC).
This is a two parts question.
First, In Airflow downstream task can not effect upstream task. Assuming structure of:
taskA >> taskB >> taskC >> taskD
then if taskB is successful and taskC failed. it can not change the state of taskB to failed.
Second, clearing (rerun) a TaskGroup is a feature that currently is not available. There is an open feature request for it in Airflow repo. You can view it in this link.

What is the usage of DummyOperator in Airflow?

I 'm new to Airflow and I know DummyOperator just does nothing.
So what is the scenario for DummyOperator?
When would you typically use it?
A common use is to create simplified workflows. Consider an example.
task_1 >> task_3
task_2 >> task_3
task_1 >> task_4
task_2 >> task_4
Technically you want task_3 and task_4 to be executed only after both task_1 and task_2 are completed. But when you look at the graph, it is not super intuitive.
Solution? You can improve the readability (not code readability instead you can understand the graphs and thereby workflow.) by adding a task_dummy after task_1 and task_2 and run task_3 and task_4 after task_dummy. So when a new user takes a look at graphs, he will immediately understand the workflow. The modified workflow will be as follows.
task_1 >> task_dummy << task_2
task_dummy >> task_3
task_dummy >> task_4

Rerun Airflow Dag From Middle Task and continue to run it till end of all downstream tasks.(Resume Airflow DAG from any task)

Hi i am new to Apache Airflow i have dag of dependancies lets say
Task A >> Task B >> Task C >> Task D >> Task E
Is it possible to run Airflow DAG from middle task lets say Task C ?
Is it possible to run only specific branch in case of branching
operator in middle?
Is it possible to resume Airflow DAG from last failure task?
If not possible how to manage large DAG's and avoid rerunning
redundant tasks?
Please provide me suggestions on how to implement this if possible.
You can't do it manually. If you set BranchPythonOperator you can skip tasks till the task you wish to start with according to the conditions set in the BranchPythonOperator
Same as 1.
yes. You can clear tasks upstream till root or down stream till all leaves of the node.
You can do something like:
Task A >> Task B >> Task C >> Task D
Task C >> Task E
Where C is the branch operator.
For example:
from datetime import date
def branch_func():
if date.today().weekday() == 0:
return 'task id of D'
else:
return 'task id of E'
Task_C = BranchPythonOperator(
task_id='branch_operation',
python_callable=branch_func,
dag=dag)
This will be task sequence on Monday :
Task A >> Task B >> Task C >> Task D
This will be task sequence on rest of the week:
Task A >> Task B >> Task C >> Task E

Schedule a job in Unix

I am pretty new to Unix environment.
I am trying to schedule two tasks in Unix server. The second task is dependent on the result of the first task. So, I want to run the first task. If there is no error then I want the second task to run automatically. But if the first task fails, I want to reschedule the first task again after 30 minutes.
I have no idea where to start from.
You don't need cron for this. A simple shell script is all you need:
#!/bin/sh
while :; do # Loop until the break statement is hit
if task1; then # If task1 is successful
task2 # then run task2
break # and we're done.
else # otherwise task1 failed
sleep 1800 # and we wait 30min
fi # repeat
done
Note that task1 must indicate success with an exit status of 0, and failure with nonzero.
As Wumpus sharply observes, this can be simplified to
#!/bin/sh
until task1; do
sleep 1800
done
task2

Resources