Is there a way to ignore failed task and proceed to next step after let's say 2 re-tries?
Example;
t1= SomeOperator(...)
t2= SomeOperator(...)
t2.set_upstream(t1)
# if t1 fails re-try 2 times and proceed to t2
# else if t1 success then proceed to t2 as usual
Take a look at airflows trigger rules.
By default, the trigger rule for every task is 'all_success', meaning the task will only get executed when all directly upstream tasks have succeeded.
What you would want here is the trigger rule 'all_done', meaning all directly upstream tasks are finished, no matter whether they failed or succeeded.
But be careful, as this also means that if a task that is not directly upstream fails, and the tasks following that task get marked as 'upstream_failed', the task with this trigger rule will still get executed.
So in your case, you would have to set retries=2 for t1 and trigger_rule='all_done' for t2.
Related
I have two tasks A and B. Task A failed once but the retry succeeded and is marked as a success (green). I would expect Task B to perform normally since Task A retry succeeded but it is marked as upstream_failed and was not triggered. Is this a way to fix this behavior?
The Task B has an ALL_SUCCESS trigger rule.
I am using Airflow 2.0.2 on AWS (MWAA).
Trying to restart the scheduler.
upstream_failed happened from scheduler flow or when depends are seting to failed state, you can check states from Task Instances
in Retry Mode:
Task A will be in up_for_retry state until exceed retries number.
If trigger_rule set with all_success(it's default trigger rule), Task B will not trigger untill Task A finished, If every thing running correctly.
Could you add the DAG implementation?
I'm trying to see whether or not there is a straightforward way to not start the next dag run if the previous dag run has failures. I already set depends_on_past=True, wait_for_downstream=True, max_active_runs=1.
What i have is tasks 1, 2, 3 where they:
create resources
run job
tear down resources
task 3 always runs with trigger_rule=all_done to make sure we always tear down resources. What i'm seeing is that if task 2 fails, and task 3 then succeeds, the next dag run starts and if i have wait_for_downstream=False it runs task 1 since the previous task 1 was a success and if i have wait_for_downstream=true then it doesn't start the dag as i expect which is perfect.
The problem is that if tasks 1 and 2 succeed but task 3 fails for some reason, now my next dag run starts and task 1 runs immediately because both task 1 and task 2 (due to wait_for_downstream) were successful in the previous run. This is the worst case scenario because task 1 creates resources and then the job is never run so the resources just sit there allocated.
What i ultimately want is for any failure to stop the dag from proceeding to the next dag run. If my previous dag run is marked as fail then the next one should not start at all. Is there any mechanism for doing this?
My current 2 best effort ideas are:
Use a sub dag so that there's only 1 task in the parent dag and therefore the next dag run will never start at all if the previous single task dag failed. This seems like it will work but i've seen mixed reviews on the use of sub dag operators.
Do some sort of logic within the dag as a first task that manually queries the DB to see if the dag has previous failures and fails the task if it does. This seems hacky and not ideal but that it could work as well.
Is there any out of the box solution for this? Seems fairly standard to not want to continue on failure and not want step 1 to start of run 2 if not all steps of run 1 were successful or if run 1 itself was marked as failed.
The reason depends_on_past is not helping your is it's a task parameter not a dag parameter.
Essentially what you're asking for is for the dag to be disabled after a failure.
I can imagine valid use cases for this, and maybe we should add an AirflowDisableDagException that would trigger this.
The problem with this is you risk having your dag disabled and not noticing for days or weeks.
A better solution would be to build recovery or abort logic into your pipeline so that you don't need to disable the dag.
One way you can do this is add a cleanup task to the start of your dag, which can check whether resources were left sitting there and tear them down if appropriate, and just fail the dag run immediately if you get an appropriate error. You can consider using airflow Variable or Xcom to store the state of your resources.
The other option, notwithstanding the risks, is the disable dag approach: if your process fails to tear down resources appropriately, disable the dag. Something along these lines should work:
class MyOp(BaseOperator):
def disable_dag(self):
orm_dag = DagModel(dag_id=self.dag_id)
orm_dag.set_is_paused(is_paused=True)
def execute(self, context):
try:
print('something')
except TeardownFailedError:
self.disable_dag()
The ExternalTaskSensor may work, with an execution_delta of datetime.timedelta(days=1). From the docs:
execution_delta (datetime.timedelta) – time difference with the previous execution to look at, the default is the same execution_date as the current task or DAG. For yesterday, use [positive!] datetime.timedelta(days=1). Either execution_delta or execution_date_fn can be passed to ExternalTaskSensor, but not both.
I've only used it to wait for upstream DAG's to finish, but seems like it should work as self-referencing because the dag_id and task_id are arguments for the sensor. But you'll want to test it first of course.
Is it possible to somehow extract task instance object for upstream tasks from context passed to python_callable in PythonOperator. The use case is that I would like to check status of 2 tasks immediately after branching to check which one ran and which one is skipped so that I can query correct task for return value via xcom.
Thanks
I have a DAG, it has 5 tasks A,B,C,D,E. and 5 tasks triggered by failed tasks above, one for each, A_f, B_f, C_f, D_f and E_f (and similarly five on success). And at last, task X, which writes failure results to a database.
Lets say, if 2 of first five tasks failed (A and D), only A_f and D_f get triggered. What can I do to run task X?
Will all_done work? even if some of the upstream tasks were never triggered? I am not so sure about it.
Yes all_done should work. As long as none of Task X's upstream tasks have a state of None, which for any given dag run shouldn't be possible because tasks' states are inferred from previous tasks' states (i.e. skipped state is propagated or any children of a failed task are set to upstream failed), then the all_done trigger will work.
My DAG looks like this:
task1 >> task2 >> task3
and task2 has failed.
I would like to restart the dagrun from the last failure point which is task2, and I understand I can use "clear" for that.
The issue is that when I use clear the execution attempt is cleared and looking back I have no record of this failure anymore.
I'm wondering if I can rerun, but still keep the failure history.
The failure data is available in a tab inside the log.
For example, if I have auto retries set to 3 and all 3 fail, there will be 3 tabs in the Airflow UI Logs.
Similarly, restarting a failed task will log the new runs in a new tab.
It will be under the heading of "Log by attempts".
Of course this also means that you can view them in the actual log files themselves. There is a attempt numbering in the log file to indicate the divide between attempts.